Nvidia’s Cosmos Purpose 2 goals to carry reasoning VLMs into the bodily world

[ad_1]

Nvidia’s Cosmos Purpose 2 goals to carry reasoning VLMs into the bodily world

Contents

Transferring to bodily brokers Additions to the Nemotron household

Nvidia CEO Jensen Huang mentioned final yr that we are actually coming into the age of bodily AI. Whereas the corporate continues to supply LLMs for software program use circumstances, Nvidia is more and more positioning itself as a supplier of AI fashions for totally AI-powered methods — together with agentic AI within the bodily world.

At CES 2026, Nvidia introduced a slate of recent fashions designed to push AI brokers past chat interfaces and into bodily environments.

Nvidia launched Cosmos Purpose 2, the newest model of its vision-language mannequin designed for embodied reasoning. Cosmos Purpose 1, launched final yr, launched a two-dimensional ontology for embodied reasoning and at present leads Hugging Face’s bodily reasoning for video leaderboard.

Cosmos Purpose 2 builds on the identical ontology whereas giving enterprises extra flexibility to customise functions and enabling bodily brokers to plan their subsequent actions, much like how software-based brokers motive by means of digital workflows.

Nvidia additionally launched a brand new model of Cosmos Switch, a mannequin that lets builders generate coaching simulations for robots.

Different vision-language fashions, equivalent to Google’s PaliGemma and Pixtral Giant from Mistral, can course of visible inputs, however not all commercially obtainable VLMs help reasoning.

“Robotics is at an inflection level. We’re transferring from specialist robots restricted to single duties to generalist specialist methods,” mentioned Kari Briski, Nvidia vp for generative AI software program, in a briefing with reporters. She was referring to robots that mix broad foundational information with deep task-specific abilities. “These new robots mix broad elementary information with deep proficiency and complicated duties.”

She added that Cosmos Purpose 2 “enhances the reasoning capabilities that robots must navigate the unpredictable bodily world.”

Transferring to bodily brokers

Briski famous that Nvidia’s roadmap follows “the identical sample of belongings throughout all of our open fashions.”

“In constructing specialised AI brokers, a digital workforce, or the bodily embodiment of AI in robots and autonomous automobiles, extra than simply the mannequin is required,” Briski mentioned. “First, the AI wants the compute assets to coach, simulate the world round it. Information is the gas for AI to be taught and enhance and we contribute to the world's largest assortment of open and various datasets, going past simply opening the weights of the fashions. The open libraries and coaching scripts give builders the instruments to purpose-build AI for his or her functions, and we publish blueprints and examples to assist deploy AI as methods of fashions.”

The corporate now has open fashions particularly for bodily AI in Cosmos, robotics, with the open-reasoning vision-language-action (VLA) mannequin Gr00t and its Nemotron fashions for agentic AI.

Nvidia is making the case that open fashions throughout totally different branches of AI kind a shared enterprise ecosystem that feeds information, coaching, and reasoning to brokers in each the digital and bodily worlds.

Additions to the Nemotron household

Briski mentioned Nvidia plans to proceed increasing its open fashions, together with its Nemotron household, past reasoning to incorporate a brand new RAG and embeddings mannequin to make info extra available to brokers. The corporate launched Nemotron 3, the newest model of its agentic reasoning fashions, in December.

Nvidia introduced three new additions to the Nemotron household: Nemotron Speech, Nemotron RAG and Nemotron Security.

In a weblog put up, Nvidia mentioned Nemotron Speech delivers “real-time low-latency speech recognition for dwell captions and speech AI functions” and is 10 instances quicker than different speech fashions.

Nemotron RAG is technically comprised of two fashions: an embedding mannequin and a rerank mannequin, each of which may perceive photographs to offer extra multimodal insights that information brokers will faucet.

“Nemotron RAG is on prime of what we name the MMTab, or the Huge Multilingual Textual content Embedding Benchmark, with robust multilingual efficiency whereas utilizing much less computing energy reminiscence, so they’re match for methods that should deal with loads of requests in a short time and with low delay,” Briski mentioned.

Nemotron Security detects delicate information so AI brokers don’t by accident unleash personally identifiable information.

[ad_2]