Waymo collaborates with DeepMind to create a world model: based on Genie 3, allowing autonomous driving to "imagine" rare scenarios

Wallstreetcn
2026.02.07 12:19
portai
I'm PortAI, I can summarize articles.

Waymo has partnered with DeepMind to launch the Waymo World Model, built on Genie 3, aimed at enhancing autonomous driving simulation capabilities. This model can generate highly realistic 3D environments, simulating rare events such as tornadoes and elephants, supporting high-fidelity multi-sensor data generation. Waymo stated that its autonomous driving system has driven billions of miles in virtual environments, and the Waymo World Model is a core infrastructure that helps the system handle complex traffic scenarios in the real world

Just now, Waymo, the autonomous driving car company under Alphabet, launched its latest world model, Waymo World Model, which is built on DeepMind's Genie 3 and sets a new industry benchmark in large-scale, ultra-realistic autonomous driving simulations.

DeepMind CEO and Nobel Prize winner Demis Hassabis also retweeted and shared that this use case based on Genie 3 is "super cool."

Waymo World Model is built on Google DeepMind's general world model Genie 3, capable of generating highly realistic and interactive 3D environments, and has been specialized to meet the stringent requirements of autonomous driving. With Genie’s rich world knowledge, it can simulate extremely rare events—from tornadoes to encounters with elephants—that are nearly impossible to replicate on a large scale in reality.

At the same time, the model architecture is highly controllable, allowing engineers to quickly adjust simulation content through simple language prompts, driving inputs, or scene layouts. More importantly, Waymo World Model supports the generation of high-fidelity, multi-sensor data, including camera images and LiDAR point clouds, providing a comprehensive and realistic training and testing environment for autonomous driving systems.

Waymo stated that Waymo Driver has accumulated nearly 200 million miles of fully autonomous driving, becoming part of the operational systems in several major cities across the United States, and continuously improving road safety. However, what the public often does not see is that before actually driving on public roads, this system has already driven billions of miles in the virtual world, repeatedly rehearsing various complex, rare, and even extreme traffic scenarios. Waymo World Model is the core infrastructure supporting this capability, enabling the autonomous driving system to master the ability to respond to the real world in advance, outside of reality.

Next, let's take a look at the performance of the Waymo World Model in actual operations, including the simulation driving process of Waymo Driver in various rare and extreme edge scenarios.

Emergent Multimodal World Knowledge

Most simulation models in the autonomous driving industry are trained from scratch based solely on road data collected by themselves. This approach means that the system can only learn from a limited amount of real-world experience In contrast, Genie 3 is pre-trained on an extremely large and diverse set of video data, thereby acquiring powerful world knowledge that allows it to explore scenarios that the fleet has never directly experienced.

Through a specially designed post-training process, Waymo has transferred this vast 2D video world knowledge into the unique 3D LiDAR output of Waymo's hardware suite. Cameras excel at presenting rich visual details, while LiDAR provides valuable complementary signals, such as precise depth information. The Waymo world model can generate almost any scenario across multiple sensor modalities—from everyday driving to extremely rare "long-tail" scenarios.

Examples of Extreme Weather and Natural Disasters

A vehicle driving on the Golden Gate Bridge covered with light snow, with Waymo's shadow visible in the front camera view;

In extreme weather, a vehicle encounters a tornado:

Rare and Safety-Critical Events

During vehicle operation, a reckless driver improperly maneuvers and drives off the road:

Walking on the road, a malfunctioning truck is driving in reverse, blocking the road:

Rare Situations Encountering Animals or Objects such as Elephants and Longhorns

A vehicle driving on the road encounters an elephant head-on:

A vehicle driving on the road meets a Texas longhorn:

Powerful Simulation Control

The Waymo world model provides powerful simulation control. This relies on three main mechanisms: driving behavior control, scene layout control, and language control.

Driving behavior control creates a responsive simulator that follows specific driving inputs. This allows us to simulate counterfactual events, such as whether the Waymo driving system can drive more confidently and safely in specific situations, rather than yielding Counterfactual driving. Below, Waymo demonstrates simulation results under both the original path of recorded driving and a completely new path. Although pure reconstruction-based simulation methods (such as 3D Gaussian splatting, or 3DGS) can easily suffer from visual distortion due to a lack of observational data when the simulated path differs significantly from the original driving path, Waymo's world model, which is entirely learning-based, can still maintain good realism and consistency thanks to its strong generative capabilities.

Scene layout control allows for customization of road layouts, traffic light states, and the behaviors of other road users. In this way, customized scenes can be created by selectively placing other road users or applying custom variations to the road layout.

Scene layout condition control follows

Language control is the most flexible tool in Waymo's world model, which can be used to adjust the time of day, weather conditions, and even generate completely synthetic scenes (such as the long-tail scenes shown earlier).

World variation: Time

World variation: Weather

Conversion of Dashcam Videos

During a scenic trip, people often record videos along the way with their phones or dashcams, capturing snow walls or highways under the sunset. Waymo's world model can convert such videos, as well as any videos taken with regular cameras, into multimodal simulations, presenting what the Waymo Driver "sees" in the same scene. Waymo states that since the simulation is directly derived from real images, this process achieves the highest level of realism and factual accuracy.

Scalable Inference

Some scenes that need to be simulated may take a long time to fully present, such as situations involving passage through narrow lanes. Long-duration simulations are often more challenging because as the simulation time increases, the computational burden grows, making it more difficult to maintain stable high quality. However, through the efficient variants of Waymo's World Model, longer scenes can be simulated while significantly reducing computational load, maintaining high realism and fidelity, thus supporting large-scale simulations. **

Long-term simulation (4x speed) conducted on efficient variants:

By simulating these extremely rare situations, Waymo Driver can prepare in advance for complex, long-tail driving scenarios. This capability sets stricter safety benchmarks for autonomous driving systems, ensuring they are equipped to handle similar challenges before encountering them on real roads.

Risk Warning and Disclaimer

The market has risks, and investment should be cautious. This article does not constitute personal investment advice and does not take into account the specific investment goals, financial situation, or needs of individual users. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investment based on this is at one's own risk