OpenAI Co-founder: Autonomous driving and VR are just "diversions," AI agents are the future.
Andrej Karpathy 认为 AI 智能体代表着一个疯狂的未来,此刻正是再次回归神经科学,从中寻求灵感的时刻。
After witnessing the development potential of AI agents through Huang Renxun and Tesla, Andrej Karpathy, co-founder of OpenAI and former Director of AI at Tesla, recently exclaimed that AI agents represent a crazy future.
Andrej Karpathy admitted that during his time at Tesla, he was "distracted by autonomous driving" and that researching autonomous driving and VR was not the right path for developing AI agents. He believes that now is the time to return to neuroscience and seek inspiration from it.
On the other hand, Andrej Karpathy believes that everyone has an advantage over companies like OpenAI in building AI agents. Currently, everyone is in a state of equal competition, and he is looking forward to seeing the results in this field:
AI agents represent a crazy future. Although it may still be a bit far off, the AI agents built by everyone present today are at the forefront of AI agent capabilities.
Currently, all institutions working on large language models, such as OpenAI, are not at the forefront of this field. The people present here are at the forefront.
DeepMind, an AI team under Google, recently published a paper introducing an AI agent called RoboCat, which is capable of self-improvement. RoboCat is essentially a software program empowered by AI, serving as the "brain" of a robot. Unlike traditional robots, RoboCat is more "versatile" and capable of self-improvement and self-enhancement.
Embodied intelligence is more valuable than humanoid robots
Embodied intelligence is equivalent to the brain of AI, and the carrier of this brain can take any form. It can be a robotic arm, a robotic dog, or even a car.
On the other hand, humanoid robots are currently seen as not very intelligent steel giants, mainly because they lack an AI brain and have less flexible bodies.
In simple terms, large models like GPT-4 cannot truly influence the physical world, while embodied intelligence has a physical body. It collects environmental information through sensors and performs physical operations using mechanical actuators, or interacts with humans and the environment in real-time through robots or other physical entities.
Tesla once said that although one day everyone may have a humanoid robot, the currently demonstrated Optimus humanoid robot can only perform repetitive and simple tasks.
The goal of embodied intelligence is to enable machines to better understand and adapt to complex environments, solve problems more efficiently, and possess more flexible behavior. By integrating perception, decision-making, and execution processes, embodied intelligence allows machines to approach human-level intelligence, thereby playing an important role in robotics, autonomous driving, intelligent manufacturing, and other fields.
Karpathy stated that seven years ago, the timing for researching AI agents was not mature, and the results were not good due to technological limitations. As a result, he and OpenAI changed direction and started researching large language models. And now, with the advent of new technological means to study AI agents, the situation is completely different from 2016:
The simplest example is that no one is using reinforcement learning methods to study AI agents like they did in 2016. The research methods and directions today are unimaginable back then.
The Next Wave of AI?
The emergence of large language models (LLMs) has brought new possibilities for building embodied intelligent agents. Because LLM-based agents can leverage the world knowledge embedded in pre-trained models to generate consistent action plans or executable strategies, they are particularly suitable for tasks such as gaming and robotics.
DeepMind's RoboCat is just one of the main examples of AI-empowered robots.
This year, several companies have applied language models to robots: In early 2023, Google introduced the visual language model PaLM-E and applied it to industrial robots. In April, Alibaba integrated the Qianwen large model into industrial robots. In May, Tesla's humanoid robot Optimus demonstrated precise control and perception capabilities, and in the same month, NVIDIA released a new autonomous mobile robot platform.
Thanks to this, AI-powered robots embodying embodied intelligence have attracted widespread attention worldwide.
At Tesla's 2023 shareholder meeting, it was stated that humanoid robots will be the main long-term source of value for Tesla in the future:
"If the ratio of humanoid robots to humans is around 2 to 1, then the demand for robots could be 10 to 20 billion, far exceeding the number of electric vehicles."
Jensen Huang, the founder of NVIDIA, also stated at the ITF World 2023 Semiconductor Conference that the next wave of AI will be "embodied intelligence". Wall Street News previously pointed out that Guosheng Securities analysts believe that embodied intelligence has the characteristics of physical feedback and physical output, and can become a new carrier for communication, computation, and storage:
In the future, embodied intelligence will increasingly emphasize the matching and coupling of edge communication capabilities and edge computing power.
The physical embodiment of AI is not actually the most important aspect; the core should be the development of the AI brain, the establishment of seamless human-machine interaction, and enabling AI to actively perceive the physical world. It is through human-like thinking pathways that AI can achieve the desired behavioral feedback from humans. Machine vision and multimodal large models are the two keys to unlocking this world.