NVIDIA Robotics Head: AI Intelligent Entities Will Ignite the "ChatGPT Moment" in the Robotics Field

NVIDIA makes a significant bet on AI intelligence, aiming to replicate the "ChatGPT moment" in the robotics field. Its business is transitioning from hardware to "air traffic control"-style intelligent orchestration software, aiming to solve the challenges of large-scale implementation. Although world models still need iteration, this move has completely opened up NVIDIA's commercial imagination for the next stage

NVIDIA is extending its bets in the AI agent field to the robotics track, wagering that this technology can solve the core challenges of large-scale robot deployment.

According to The Information, Deepu Talla, Vice President of Robotics and Edge AI at NVIDIA, stated in an interview during the annual GTC conference held in San Jose, California, that AI agent systems are being built as "digital-first," with robots being a natural extension of this system. He predicts that the involvement of AI agents will mark a significant turning point for the robotics industry—similar to the impact of ChatGPT on the AI industry—making robot deployment as simple as "hands-on and self-sufficient."

This statement further clarifies NVIDIA's strategic layout direction for the next phase of AI. For investors, this means that NVIDIA's narrative around its robotics business is shifting from hardware and simulation software to higher-level agent orchestration software, with potential market space and business models expected to expand further.

AI Agents: The "Air Traffic Control" of Robots

Talla outlined two core values of AI agents in robotic scenarios. The first layer is the coding layer: agents can be used to build the "brain" of robots, automatically generating training data and evaluating robot AI models. NVIDIA announced this week that coding agents like Claude Code, OpenAI's Codex, and Cursor can now utilize its Osmo software to automate these functions.

The second layer is the orchestration layer: in multi-robot collaborative scenarios such as factories or warehouses, a single agent can act as "air traffic control," breaking down overall goals into specific tasks, assigning them to humanoid robots, industrial robotic arms, and other forms of robots, while ensuring that collisions do not occur between robots and between robots and human workers. Talla noted that this orchestration function will run on cloud or local servers, continuously simulating different strategies and issuing execution plans.

This direction is not unique to NVIDIA. Reports indicate that Amazon released DeepFleet last year—its self-developed warehouse robot coordination AI model, which is expected to improve robot operational efficiency by 10%.

Market Logic Behind the ChatGPT Analogy

Talla attributes the success of ChatGPT to two points: first, its versatility, allowing it to handle various tasks without specialized training; second, its extremely low barrier to entry, enabling anyone to start using it without prior learning. He believes that the robotics industry also needs to achieve breakthroughs in these two areas—having a general brain capable of reasoning and problem-solving, and making robot deployment sufficiently simple.

NVIDIA CEO Jensen Huang also stated at the GTC conference, "In a few years, the idea of OpenClaw running internally in robots will be quite obvious," referring to this popular open-source agent. At this conference, open-source agents (including NVIDIA's self-developed NemoClaw) and robotics emerged as the two most 关注的 themes.

It is worth noting that Talla admitted that agent orchestration cannot solve all the challenges faced by robots—there are still significant shortcomings in robots' ability to manipulate small or soft objects and operate safely around humans

Cosmos World Model: Progress is Varied and Still Needs Maturity

Regarding the world model relied upon for robot training, Talla provided a cautious assessment of the current state of the Cosmos model under NVIDIA. He stated that Cosmos will be released in January 2025, with iterative updates every two to three months thereafter. As the quality of versions improves, the number of adopters continues to increase, but some companies still choose to wait for the next version in three to six months.

Talla pointed out that Cosmos is a collection of multiple different models, covering capabilities such as reasoning, prediction, and 3D data generation, with varying levels of technology maturity. Whether it can meet the demands of specific application scenarios depends on the use case.

In terms of computational power consumption structure, he indicated that the computational power of robot companies is currently mainly concentrated in the model training phase, as a general-purpose robotic brain does not yet exist, and the core bottleneck restricting its construction is a lack of data. He predicts that with the large-scale deployment of robots, the demand for simulation computing will show a "hockey stick" growth, but "we are still far from the mass deployment of robots." This judgment is of significant reference value for assessing the mid-term demand rhythm of NVIDIA GPUs in the robotics field