Google and OpenAI point the way! The first "killer AI application", the battleground for AI smartphones?
Some opinions believe that two AI assistants may seem powerful in functionality, but their actual utility is still unknown
Author: Li Xiaoyin
Source: Hard AI
Following OpenAI's rapid release of a heavyweight new product link, Google also made a bold move by directly confronting GPT-4o.
On Tuesday, May 24th, local time, at the annual Google I/O Developer Conference, Google CEO Sundar Pichai unveiled a series of new products and features related to AI, including: AI Overviews technology for generating summaries, expanding the context window of Gemini 1.5 Pro to 2 million tokens, the multimodal Gemini Nano model, the sixth-generation TPU chip Trillium, and more.
In terms of AI search engines, Google introduced a series of updates. Of note, Google released a multimodal AI project called Astra, designed to process multimodal inputs such as audio and video content.
A demonstration video showed that Astra can identify objects through a mobile phone camera and even recognize the location.
Regardless of positioning or functionality, the arrival of this Google AI assistant clearly poses a threat to GPT-4o.
Chirag Shah, a professor at the University of Washington specializing in online search, commented:
"In the end, you will have an agent that truly understands you, can do many things for you, and execute commands across tasks and domains."
Google also announced at the conference that starting this summer, Gemini will also support real-time voice interaction and will launch real-time video interaction later this year. In the coming months, Google will also introduce a custom AI assistant feature similar to GPTs, called Gems, which can interact with the entire "Google ecosystem."
The first "killer application" of AI?
Based on the presentations from OpenAI and Google, GPT-4o currently can only handle still images, while Astra can process videos, giving it a significant advantage.
Furthermore, Google made numerous updates to the large model Gemini 1.5 Pro at the conference, enabling it to have more natural voices, longer conversations, better understanding of audio and images, more logical reasoning and planning capabilities, and better code generationHowever, the technological innovation behind GPT-4o is equally impressive. It is reported that this native multimodal model can directly receive/generate speech without the need for the speech-to-text conversion process, significantly reducing the processing time. Additionally, the parameter requirements for task execution have also been greatly reduced, thereby improving operational speed and reducing costs.
At the current stage of development, it is difficult to determine who has the upper hand between OpenAI and Google's AI assistants, but their emphasis on this field is undeniable.
According to previous reports from the media, Apple is also considering incorporating GPT technology into its mobile voice assistant Siri to support AI functions.
With tech giants making successive moves, does this mean that AI assistants will become the next "killer app" in AI?
The answer is still uncertain.
Some analysis points out that although the use cases demonstrated by GPT-4o and Astra are interesting, "almost none" of them help people with their work. In other words, while these two AI assistants may seem powerful, their actual utility remains unknown.
The analysis suggests that if AI assistants can better understand users' personal preferences in the future, their "agent" attributes may be enhanced, helping users truly complete daily tasks such as online shopping, reservations, form filling, etc.
What needs to be addressed next for AI on smartphones?
Although OpenAI and Google's AI assistants can operate directly through speech, video, and other forms, some believe that they still cannot be called a true AI assistant.
The reason is that, although GPT-4o and Astra can answer questions and perform search tasks, they cannot truly execute tasks.
Wall Street News previously mentioned that one of the pain points for OpenAI in developing edge AI is: edge application permissions, system-level permissions. This may also be one of the reasons it seeks to collaborate with Apple.
For now, as long as AI assistant products have not truly integrated into the mobile system, the position of voice assistants like Siri cannot be shaken.
Some opinions point out that certainty is more important than AGI (Artificial General Intelligence), and reliability comes first.
This view believes that even the best AI systems currently are not adequately prepared to truly achieve the functionality of personal assistants; while the voice assistants built into phones may not be as "interesting," they are at least less likely to make mistakes