OpenAI, Microsoft, and Meta Platforms have successively entered the stage, and the overseas AI terminal battle is focused on: glasses!

According to reports, Alphabet-C is still developing smart glasses software, OpenOpenAI and Snap are joining forces, Meta and Ray-Bans complement each other's strengths, Apple is prepared on the hardware side, and Amazon plans to launch a multi-modal OpenAI device.

The global AI competition is bound to reach a new climax next year, and the battle among major tech giants for smart glasses will also become a focus.

With the strong rise of multimodal AI, companies such as Meta, Google, Microsoft, and OpenAI are competing to apply more powerful AI technologies to smart glasses and other wearable devices.

According to the latest report from The Information, although Google has terminated its augmented reality (AR) glasses project, it is still developing software for smart glasses. Last week, Google's most powerful AI model, Gemini, demonstrated its multimodal capabilities, taking the first step towards creating an "always-on" AI assistant. However, it may still take several years to truly achieve this goal.

Citing an insider, the media reported that OpenAI is considering embedding its object recognition software, GPT-4, and Vision into Spectacles, the smart glasses product of Snapchat's parent company, Snap, which may bring new features.

Meta has embedded a multimodal AI voice assistant into the smart glasses they are developing in collaboration with luxury sunglasses company Ray-Ban. This assistant can describe what the wearer sees, provide suggestions for shirt and pants combinations, and translate Spanish text into English.

In addition, Amazon has been discussing a new type of AI device in recent months, which is said to have similar visual capabilities.

Google: Still Developing Software for Smart Glasses

According to reports, although Google canceled the development of smart glasses as early as mid-year, it is still developing software for them. Google plans to license the software to hardware manufacturers, similar to the way it developed the Android mobile operating system for smartphone manufacturers like Samsung.

Google released a video last week showcasing some of Gemini's features, such as automatically recognizing movies performed by users, providing suggestions for objects in front of them, and the ability to learn new games.

The Gemini family currently has three members: Gemini Ultra, Gemini Pro, and Gemini Nano, which will be open to different customer groups. However, Google did not publicly disclose the so-called advanced version of Gemini in the video, and the user interactions in the video were rendered.

Nevertheless, the video demonstrates Google's vision of creating an "always-on" AI assistant that can respond and understand in real-time what users are doing and seeing.

Insiders revealed to The Information that it will still take several years to achieve this "ambient computing." As a first step, Google is redesigning the operating system of Pixel phones to embed a small Gemini model that powers the Pixie AI assistant, enabling it to handle more complex and multimodal tasks. According to previous reports, Pixie can recommend nearby stores to buy related products based on the photos taken by users. Google's core search technology is about predicting and providing users with the information they need, so developing AI devices like this is very much in line with Google's positioning. Google's attempt at smart glasses ten years ago was a failure, as the design was awkward and the practicality limited, and users didn't buy into it.

Later on, Google made adjustments to the camera design and pushed Android phone manufacturers to turn their phone cameras into a "third eye" that can scan the environment and send images to Google's cloud for analysis, providing users with contextual information. However, this idea eventually evolved into the image search application Google Lens.

OpenAI: Joining forces with Snap

According to a source cited by The Information, OpenAI is considering embedding its object recognition software GPT-4 and Vision into Spectacles, the smart glasses product of Snap, the parent company of Snapchat. This could bring new functionality to the product.

As early as March this year, OpenAI demonstrated its AI software's ability to build websites based on hand-drawn sketches. Perhaps to fully leverage the power of large models, OpenAI CEO Sam Altman has expressed interest in building a new AI-based consumer device multiple times since then.

It is worth noting that OpenAI does not have its own hardware team, but it can collaborate with other companies, such as Snap as a device manufacturer or AI chip design companies.

In addition, Altman is also investing in a company called "Humane," a manufacturer of AI devices with cameras, planning to create AI devices that can replace smartphones.

Microsoft: Actively advancing AI technology development for smart glasses

Microsoft is actively advancing the development of AI technology that can be applied to smart glasses and other small devices. These technologies may be based on speech or image recognition, with the aim of empowering more diverse intelligent hardware.

According to The Information, this work may be based on Microsoft's existing HoloLens AR headset.

The report states that Microsoft is embedding AI software into HoloLens, allowing users to discuss objects captured by the camera with a chatbot powered by OpenAI technology through voice commands.

Apple: Prepared on the hardware front

Apple lags behind its competitors in multimodal AI technology, but it has also made some progress in this field. Specifically,

Apple has prepared for the application of multimodal AI technology on its upcoming Vision Pro headset.

Apple has been behind its peers in AI algorithms and only started focusing on large language models (LLMs) this year, with previous research only in the preliminary stages. According to The Information, there is currently no indication that Vision Pro will have complex multimodal capabilities in the near future.

However, Apple has been working on enhancing the computer vision capabilities of Vision Pro, enabling it to quickly recognize the surrounding environment, such as identifying furniture and determining whether it is in a living room, bedroom, or kitchen. Apple is currently developing a multimodal model that can recognize images and videos.

One major obstacle for Vision Pro is its large and bulky size, which makes it less suitable for outdoor wear. Earlier this year, Apple reportedly suspended the development of AR glasses to focus on launching headsets. It is unclear when the glasses project will be restarted, but it may eventually achieve multimodal functionality.

Meta: Complementary to Ray-Bans

On December 12th, Meta and luxury sunglasses company Ray-Bans launched a new product called Meta Ray-Bans smart glasses, which features multiple AI functions such as taking photos, calculating food calories, identifying plants, and translation. Based on media reports and Mark Zuckerberg's trial, the AI performance of Meta Ray-Bans smart glasses seems to be quite good.

As early as September this year, in an interview with the media, Zuckerberg revealed that Meta would introduce multimodal AI functions on smart glasses. Multimodal refers to AI that supports various forms of media input, such as text, images, and voice.

It is reported that Meta also plans to enable the glasses to detect sensory data from the human body, further enhancing its multimodal capabilities. The Qualcomm AI chip embedded in the glasses seems to be performing well, and Meta plans to further optimize the user experience.

Currently, Meta glasses are available for sale at a price of $300, and the AI functions are in the early testing phase, only open to a limited number of users. However, it is worth noting that Meta has stated that it will use anonymous data to help improve the AI services of the glasses, which may make many privacy-conscious users uncomfortable.

Amazon: Plans to Launch Multimodal AI Device

According to sources familiar with the project, the Amazon Alexa team plans to launch a new device capable of running multimodal AI this summer.

The team is particularly interested in reducing the AI computing and memory requirements for processing images, videos, and voice on the device.

The report states that it is currently unclear whether the project has received funding support or what problem the device intends to solve for customers, but it is separate from Amazon's existing Echo series of voice assistant devices.

The Alexa team has been working on the development of new devices for years, including smart audio glasses called Echo Frames. However, it is currently unclear whether Amazon will develop a device with visual recognition capabilities based on these glasses, as they do not have a display screen or camera.