
Alibaba's move this time focuses on embodied intelligence, aiming at deep environmental understanding

Introduced two core capabilities: spatiotemporal memory and physical world reasoning
Author | Huang Yu
The long-standing "intellectual high wall" in the field of embodied intelligence is being gradually broken down.
On February 10, Alibaba's Damo Academy officially released the foundational model for embodied intelligence, RynnBrain, and simultaneously open-sourced a full series of 7 models, including the industry's first 30B MoE (Mixture of Experts) architecture.
This move is of significant milestone importance. According to reports, RynnBrain allows robots to possess spatiotemporal memory and spatial reasoning capabilities for the first time, while also setting records (SOTA) on 16 embodied open-source evaluation leaderboards, surpassing top industry models such as Google's Gemini Robotics ER 1.5.
This means that the long-standing shackles of "spatiotemporal forgetting" and "physical illusion" faced by embodied intelligence are being diligently unraveled, and robotic brains are expected to evolve from simple command receivers into intelligent entities with deep environmental understanding capabilities.
For a long time, the intelligence level of embodied models has been a significant bottleneck restricting the generalization of robots, especially as their lack of generalization capability greatly limits their application in complex physical scenarios.
To break through this bottleneck, multiple technical exploration routes have emerged in the industry.
According to Wall Street Insights, one type focuses on action output VLA models, which can directly operate in the physical world, but due to the scarcity of high-quality machine data, it is extremely difficult to achieve cross-scenario generalization; another type introduces brain models like VLM with generalization potential, but these models generally lack memory capabilities, have limited dynamic cognition, and commonly suffer from physical illusions, making it difficult to support complex movement operations of humanoid robots.
This technological high wall caused by defects in intelligence architecture means that even seemingly advanced robots still struggle with complex movement operations.
The RynnBrain model from Alibaba's Damo Academy is designed to fundamentally break down this wall.
It is reported that RynnBrain creatively introduces two core capabilities: spatiotemporal memory and physical world reasoning, which are essential for deep interaction between robots and their environments.
Spatiotemporal memory refers to the robot's ability to locate objects, trace back to target areas, and even predict motion trajectories within a complete historical memory, thus endowing the robot with global spatiotemporal recall capabilities.
Physical space reasoning differs from traditional pure text reasoning paradigms; RynnBrain employs a reasoning strategy that interleaves text and spatial positioning, ensuring that its reasoning process is closely rooted in the physical environment, significantly reducing illusion issues.
For example, if a robot running RynnBrain is interrupted while performing task A and is asked to complete task B first, it can accurately remember the time and spatial state of task A and seamlessly resume work after task B is completed. This "long memory" mechanism addresses the long-standing issue of "instant amnesia" in the field of embodied intelligence In addition, according to Wall Street News, RynnBrain is trained based on Qwen3-VL and uses the RynnScale architecture developed by Damo Academy for deep optimization, achieving a twofold training acceleration with the same computing resources, and the training data volume exceeds 20 million pairs.
This efficient training system is directly reflected in the evaluation results: RynnBrain has comprehensively refreshed industry records in 16 key tasks, including environmental perception, object reasoning, first-person visual question answering, spatial reasoning, and trajectory prediction. This is not only a stacking of computing power but also a successful reconstruction of the underlying architecture of embodied intelligence.
It is reported that RynnBrain also has good scalability, capable of quickly post-training various embodied models for navigation, planning, and actions, and is expected to become a foundational model in the embodied industry.
In its pursuit of creating a foundational model for the embodied intelligence industry, Damo Academy has chosen the open-source route.
It is reported that Damo Academy has open-sourced the entire series of RynnBrain models, totaling 7, including full-size foundational models and proprietary post-training models. Among them is the industry's first MoE architecture 30B embodied model, which can surpass the performance of the industry's 72B model with only 3B of inference activation parameters, allowing robots to move faster and more smoothly.
At the same time, Damo Academy has also open-sourced a new evaluation benchmark, RynnBrain-Bench, for assessing spatiotemporal fine-grained embodied tasks, filling a gap in the industry.
The large-scale open-sourcing by Alibaba's Damo Academy clearly reflects a grander industry ambition, which is to accelerate the construction of an open and evolvable embodied intelligence ecosystem.
From the perspective of global technological competition, embodied intelligence is at a critical turning point, transitioning from "digital virtual" to "physical entity."
Zhao Deli, head of the embodied intelligence laboratory at Damo Academy, pointed out that RynnBrain has achieved a deep understanding and reliable planning of the physical world for the first time, marking a key step towards general embodied intelligence under the hierarchical architecture of the big and small brains. "We look forward to it accelerating the process of AI moving from the digital world to real physical scenarios."
In 2017, on the 18th anniversary of Alibaba's establishment, Jack Ma founded Damo Academy, dedicated to solving issues related to technology and research that promote productivity. At that time, Ant Group also promised to invest 100 billion yuan in Damo Academy within three years.
However, in the past three years, against the backdrop of significant organizational changes within Alibaba Group, Damo Academy has also undergone multiple adjustments and reshuffles. The previously rich "4+X" research fields have now left only "Intelligent + Computing," with the intelligent direction including medical AI, decision intelligence, video technology, embodied intelligence, and genetic intelligence, while the computing direction includes computing technology, RISC-V, etc.
Embodied intelligence is clearly one of the key areas of investment for Damo Academy today.
It is understood that in the field of embodied intelligence, Damo Academy is building a deployable, scalable, and evolvable embodied intelligence system, having open-sourced embodied models such as WorldVLA, which integrates world models and VLA models, and the world understanding model RynnEC, as well as the industry's first robot context protocol RynnRCP While Damo Academy focuses on embodied intelligence, the global humanoid robot market has also entered a critical phase of scaled development. In 2025, the global humanoid robot market will reach its scaling starting point.
According to IDC data, the global humanoid robot shipment volume last year approached 18,000 units, a year-on-year increase of approximately 508%, with sales revenue of about $440 million; during the same period, the cumulative sales order volume is expected to exceed 35,000 units.
Although this field still faces many challenges, such as the scarcity of real physical feedback data, generalization in unstructured environments, and deep collaboration between hardware and software, RynnBrain's open-source platform undoubtedly provides a relatively mature "brain template" for global developers, aiding in the accelerated commercialization of embodied intelligence.
For the industry, this is not only a release of code but also a redistribution of technological power. When top models are no longer secret weapons in the laboratories of giants, the embodied intelligence industry will enter a new cycle of accelerated iteration and collective evolution
