NVIDIA launched Nemotron 3 Ultra on June 4, 2026, at Computex 2026, marking the company's most ambitious open-weights model release to date. The 550-billion-parameter model achieves over 300 output tokens per second through a hybrid Mamba-Transformer architecture with mixture-of-experts sparsity, positioning NVIDIA as a major player in both AI infrastructure and foundation models.
Hybrid Architecture Enables 10x Parameter Sparsity
Nemotron 3 Ultra employs a mixture-of-experts design that activates only 55 billion of its 550 billion total parameters per token—a 10x sparsity ratio. The architecture interleaves Mamba-2 state-space layers with selective attention layers, combining the efficiency of structured state-space models with the expressiveness of transformer attention. This hybrid approach enables the model to process a 1-million-token context window while maintaining throughput exceeding 300 tokens per second.
Highest Performance Among US Open-Weight Models
Nemotron 3 Ultra achieved a score of 48 on the Artificial Analysis Intelligence Index, making it the highest-performing US-built open-weight model as of its launch. However, NVIDIA acknowledged that the model still trails Chinese frontier models in benchmark performance. The company made the model available through HuggingFace, OpenRouter, and NVIDIA NIM, enabling broad access for developers and enterprises.
Enterprise Adoption and Integration
Glean added support for Nemotron 3 Ultra shortly after launch, signaling enterprise interest in the model's capabilities. AWS made it available on SageMaker JumpStart for cloud deployment. Industry reports suggested NVIDIA could form a major alliance with Apple following the launch, potentially integrating Nemotron 3 Ultra into Siri for improved natural language understanding.
Key Takeaways
- NVIDIA's Nemotron 3 Ultra features 550 billion total parameters with 55 billion active per token through mixture-of-experts architecture
- The model achieves over 300 output tokens per second with a 1-million-token context window
- Nemotron 3 Ultra scored 48 on the Artificial Analysis Intelligence Index, the highest of any US-built open-weight model
- The hybrid Mamba-Transformer architecture interleaves Mamba-2 state-space layers with selective attention mechanisms
- Enterprise platforms including Glean and AWS SageMaker JumpStart added support for the model shortly after launch