AI search engine is here! Google makes a big move, releasing the most powerful AI model, challenging OpenAI's voice capabilities

Gemini 1.5 Pro introduces a context window of 2 million tokens, claiming to have the longest global window among chatbots; Gemini adds a new voice conversation feature, Live, to compete with OpenAI's new model GPT-4o; Gemini will be customizable according to user needs; Google's multimodal AI project Project Astra can answer questions related to objects captured by the mobile phone camera, and Gemini on the Android side adds multimodal functionality

Author: Li Dan

Source: Hard AI

Although OpenAI took the lead in releasing a blockbuster product demonstration, as reported by Wall Street CN, Google followed up and achieved something that OpenAI has not been able to do yet. Google took the lead in releasing an artificial intelligence (AI) search engine, defending its dominant position in the search field, while also challenging OpenAI's newly released flagship model GPT-4o with an upgraded and more powerful AI model called Gemini.

During the annual Google I/O developer conference held on Tuesday, May 24th, Eastern Time, Google CEO Sundar Pichai stated that all of Google's work revolves around the generative AI model Gemini. He mentioned, "We want everyone to benefit from what Gemini does." AI search is one of the services where Gemini is integrated, as mentioned by Pichai.

Pichai announced that this week, the feature of AI-generated summaries will be launched on Google Search in the United States, called AI Overviews, and will soon be rolled out in more countries and regions.

Through multi-step reasoning, Gemini can replace users in research and find better search results. For example, in Google Search, Gemini can plan users' diets by summarizing all meals and recipes throughout the day. If users find cooking troublesome, Google Search can also, with the help of Gemini, find places where users can purchase the meals they need.

With the assistance of Gemini, the search results page for users will also change. For instance, when looking for restaurants with live music, it can even make recommendations based on different seasons, such as displaying restaurants with rooftops.

Pichai demonstrated on-site that with the powerful features of Gemini, more relevant searches can be conducted in Google Photos. For example, through a new feature called Ask Photos with Gemini, users can find the license plate number they want. Gemini will respond based on the context to search in the album and select the desired photo showing the license plate number.

Many services of Google Workspace, the cloud productivity and collaboration platform, will be combined with Gemini. For example, using Gemini to search for emails sent by specific senders in Gmail, and finding highlights in online networking and video meetings with Google Meet

Gemini: Google's Latest AI Advancements

Gemini can be used to search users' phones, helping them find receipts and schedule pickup windows. If users are planning to travel, Gemini can be used to search for interesting activities. Pichai stated that Google is "making AI helpful for everyone."

Google announced that users will be able to ask questions directly through videos in search. Google executives demonstrated how to use video search to repair a broken record player. The specific process involves recording a video showing the issue with the record player, then asking why it is not working properly. Google search can then perform frame-by-frame searches to answer the executives' questions.

Gemini 1.5 Pro: Context Window of 2 Million Tokens, the Longest Globally

Google announced that since the launch of the powerful AI model Gemini Advanced three months ago, over 100 users have registered.

Starting from this Tuesday, Google has introduced a new model member, Gemini 1.5 Pro, into Gemini Advanced, claiming that it has the longest context window in consumer chatbots globally, starting with 1 million tokens. Gemini 1.5 Pro will be available to subscribers of Gemini Advanced in over 150 countries and regions, supporting over 35 languages.

Pichai stated that Gemini 1.5 Pro "provides the longest context window in all basic models to date." He introduced that Gemini 1.5 Pro will have a context window of 2 million tokens, double the 1 million token window of the current model.

Gemini's New Voice Conversation Feature: Live Customized Version of Gemini

Google announced that this summer, they will expand Gemini's multimodal capabilities, including adding the ability to engage in in-depth two-way conversations using voice, a feature called Live. Through Gemini Live, users can converse with Gemini and can choose the voice it responds with from various natural voices. Users can even speak at their own pace, or interrupt and clarify questions during the response, just like in any human conversation

Project Astra answers questions about objects photographed by mobile phones, and Gemini on the Android side adds multimodal functionality

Google announced the launch of a new multimodal AI project called Project Astra, which can explain things captured by smartphones for users. In a video shown by Google, as long as the phone's camera is pointed at an object, Gemini can identify it, such as a red apple, and can also answer questions like what objects in the lens can make sound Google announced that it will soon add multi-mode functionality to the Gemini Nano model. This means that users' phones can understand the world in the way users understand it through text, images, sound, and speech.

Google stated that the mobile version of Gemini Nano on the Android system will be more helpful and context-aware. This year, Android phone users will be able to drag and drop generated images into Google Messages and Gmail, and can directly ask questions about YouTube videos and PDF files on their phones and get answers.

Google also mentioned that later this year, the accessibility feature TalkBack of Gemini Nano will be enhanced. Image descriptions will be clearer and richer, helping visually impaired and blind users better navigate their phones through voice feedback.

Regarding all the releases and demonstrations at Google's developer conference on Tuesday, Google Health AI product manager Charlene Wang commented on social media X that besides AI agents and AI Teammates, the main takeaway she got from it was that Gmail, Search, Workspace, and even Chat will become more useful in the coming months. Currently, there are many products with killer user experiences that are catching attention, and the idea of organizing and syncing all content in one space will be the most convincing reason to use Google products.

Some netizens believe that Google's entire event did not reach Apple's level, calling on Google executives to learn from Apple, saying they like things from Project Astra but are not very excited because OpenAI had already released something similar on Monday.

There are also netizens who mentioned that they did not hear anything related to Android 15 system or related hardware during Tuesday's event, wondering if Google is saving them for a reveal at the October event later this year