Nvidia Releases Cosmos 3: Open Physical AI Foundation Model Unifying World and Action Generation

Nvidia released Cosmos 3 on May 31, 2026, marking a significant advancement in physical AI by unifying capabilities previously separated across different models. The foundation model combines world understanding and action generation in a single architecture, supporting text, image, video, audio, and action inputs and outputs for robotics, autonomous systems, and simulation applications.

Cosmos 3 Features Mixture-of-Transformers Architecture With Dual Towers

Cosmos 3 uses a Mixture-of-Transformers (MoT) architecture comprising two integrated towers: a Reasoner Tower functioning as a vision-language model with autoregressive architecture, and a Generator Tower employing diffusion-based systems to produce physics-aware video and action sequences. This unified approach eliminates the need for separate models for reasoning and generation tasks.

The model comes in two configurations: Cosmos 3 Nano with 16 billion parameters optimized for real-time inference on workstation GPUs like RTX PRO 6000, and Cosmos 3 Super with 64 billion parameters designed for datacenter deployment on Hopper and Blackwell GPUs.

Nvidia Provides Comprehensive Open-Source Resources and Synthetic Datasets

Nvidia released complete training infrastructure including model checkpoints on Hugging Face, open-source training and post-training scripts on GitHub, and six synthetic datasets covering robotics, physics, spatial reasoning, human motion, autonomous driving, and warehouse operations. The company also provides NVIDIA NIM microservices for production deployment.

Performance optimizations include BF16, FP8, and NVFP4 quantization, with NVFP4 achieving up to 2x speedup. The system integrates with vLLM for continuous batching and tensor parallelism, and features Efficient Video Sampling to reduce video tokens. Post-training workflows support supervised fine-tuning, action post-training for dynamics and policy generation, and customization options.

Cosmos 3 Leads Multiple Benchmarks Including VANTAGE-Bench and R-Bench

Cosmos 3 demonstrates state-of-the-art performance across multiple evaluation frameworks:

Tops VANTAGE-Bench for reasoning on real-world fixed-camera footage
Leads Traffic Anomaly Reasoning (TAR) for AI City Challenge 2026
Ranks first on Artificial Analysis leaderboard for open-source text-to-image and image-to-video models
Achieves top scores on R-Bench, PAI-Bench, Physics-IQ, and RoboLab benchmarks

Nvidia also introduced the Cosmos Human Evaluation (HUE) framework for evaluating video generation through atomic binary verification across semantic alignment, physical laws, geometric reasoning, and visual integrity, addressing limitations of automated leaderboards.

Key Takeaways

Nvidia Cosmos 3 unifies world understanding and action generation in a single physical AI model, released May 31, 2026
Available in two sizes: Cosmos 3 Nano (16B parameters) for workstation GPUs and Cosmos 3 Super (64B parameters) for datacenter deployment
Nvidia provides complete open-source infrastructure including model checkpoints, training scripts, and six synthetic datasets covering robotics and autonomous systems
NVFP4 quantization delivers up to 2x speedup while maintaining performance across benchmarks
Cosmos 3 leads multiple industry benchmarks including VANTAGE-Bench, Traffic Anomaly Reasoning, and the Artificial Analysis leaderboard for open-source models

Cosmos 3 Features Mixture-of-Transformers Architecture With Dual Towers

Nvidia Provides Comprehensive Open-Source Resources and Synthetic Datasets

Cosmos 3 Leads Multiple Benchmarks Including VANTAGE-Bench and R-Bench

Cosmos 3 demonstrates state-of-the-art performance across multiple evaluation frameworks:

Tops VANTAGE-Bench for reasoning on real-world fixed-camera footage

Leads Traffic Anomaly Reasoning (TAR) for AI City Challenge 2026

Ranks first on Artificial Analysis leaderboard for open-source text-to-image and image-to-video models

Achieves top scores on R-Bench, PAI-Bench, Physics-IQ, and RoboLab benchmarks

Key Takeaways

Nvidia Cosmos 3 unifies world understanding and action generation in a single physical AI model, released May 31, 2026

Available in two sizes: Cosmos 3 Nano (16B parameters) for workstation GPUs and Cosmos 3 Super (64B parameters) for datacenter deployment

Nvidia provides complete open-source infrastructure including model checkpoints, training scripts, and six synthetic datasets covering robotics and autonomous systems

NVFP4 quantization delivers up to 2x speedup while maintaining performance across benchmarks

Cosmos 3 leads multiple industry benchmarks including VANTAGE-Bench, Traffic Anomaly Reasoning, and the Artificial Analysis leaderboard for open-source models