Nvidia has officially introduced its next-generation Vera Rubin compute platform, unveiling an advanced architecture built specifically to support agentic AI systems that can reason, plan, and adapt instead of simply retrieving stored responses. With this launch, Nvidia signals a major step forward in preparing infrastructure for the rapidly growing demands of intelligent, autonomous AI agents.
As artificial intelligence continues to evolve, compute requirements are expanding at an unprecedented pace. Nvidia designed the Vera Rubin architecture to address what it calls the three laws of AI scaling: pre-training, post-training, and test-time scaling. While early AI development focused mainly on training larger models, today’s systems increasingly rely on extended reasoning during inference, meaning models generate more tokens and spend more time “thinking” to deliver higher-quality results. Consequently, this shift drives a dramatic rise in compute needs even after models are deployed.
During a virtual media briefing ahead of CES 2026, Dion Harris, Nvidia’s senior director of high-performance computing and AI hyperscale infrastructure, explained how the new platform supports this transition toward reasoning-based AI workloads. He introduced the Vera Rubin NVL72, a fully liquid-cooled, rack-scale system that integrates six specialized chips, including the new Vera CPU and Rubin GPU, to deliver tightly coupled performance across large AI clusters.
“Over the last year, we’ve seen an incredible leap in the intelligence of language models,” said Harris. “Top models like Kimi K2 Thinking employ reasoning during inference, generating more tokens for better answers. This increase in tokens requires an increase in compute.”
The Vera Rubin platform succeeds Nvidia’s current Blackwell architecture and delivers major performance improvements. The new Rubin GPU features ultra-fast high-bandwidth memory capable of reaching up to 22 terabytes per second, along with a third-generation transformer engine optimized for modern AI workloads. These enhancements enable the system to process increasingly complex models more efficiently across massive GPU clusters.
According to Nvidia, Rubin offers five times faster inferencing performance and 3.5 times faster training throughput compared with Blackwell. This performance boost becomes especially critical for mixture-of-experts (MoE) models, which depend on large-scale, all-to-all GPU communication to activate specialized subnetworks for each task. To meet this demand, Vera Rubin emphasizes high-speed interconnects and tightly synchronized compute across nodes.
“Rubin provides the performance necessary for the most demanding MoE models,” said Harris. “With the Vera Rubin architecture, we’re helping our partners and customers build the world’s largest, most advanced AI systems at the lowest cost.”
Beyond raw performance, the new platform also reflects Nvidia’s growing focus on energy efficiency and data-center scalability. By adopting full liquid cooling and rack-scale integration, Vera Rubin allows hyperscale providers and enterprise AI operators to deploy denser compute environments while managing thermal constraints more effectively.
Overall, Nvidia’s Vera Rubin platform positions the company to support the next wave of agentic AI, where systems move beyond static responses and actively reason, collaborate, and make decisions in real time. As AI applications increasingly rely on extended inference and multi-step problem solving, Nvidia’s latest architecture aims to ensure that infrastructure does not become the bottleneck in delivering intelligent, autonomous digital systems.
To join our expert panel discussions, reach out to info@intentamplify.com
Recommended News