Microsoft and Google Release New AI Models on the Same Day: Featuring Voice, Image, and Local Open-Source Capabilities

Microsoft and Google announced new AI models on the same day. Microsoft launched the MAI foundational model, including MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, primarily available through Azure Foundry. Google, on the other hand, released the Gemma 4 open-source model under an Apache 2.0 license, featuring advanced reasoning and generation capabilities optimized for local execution. Both differ significantly in features and delivery methods

Microsoft and Google both announced new AI models on Thursday, but with notable differences: Microsoft released its new foundational model, MAI, available only through its Azure Foundry and the US-only MAI Playground platform; whereas Google launched its entirely new Gemma 4 open-source model, capable of running locally. Furthermore, Google has changed the license for these new open-source models to Apache 2.0.

Three "World-Class" In-House MAI Models

Microsoft's "world-class" in-house MAI models include three in total:

First is MAI-Transcribe-1, an "state-of-the-art" speech-to-text model that can understand the 25 most widely spoken languages globally, boasting a transcription speed 2.5 times faster than Microsoft's existing Azure Fast solution for batch transcription.

Second is MAI-Voice-1, a new speech generation model capable of producing 60 seconds of audio in just 1 second. It also supports creating custom voices from short audio samples within Microsoft Foundry.

Finally, MAI-Image-2 is a faster text-to-image model that has already begun rolling out in Copilot and will be progressively applied to Bing and PowerPoint.

Microsoft stated:

"We are rapidly deploying these top-tier models to power our own consumer and commercial products. You will soon see more models in Foundry and across Microsoft's various products and experiences."

Google's Gemma 4 Open-Source Model

Google's Gemma 4 open-source model uses the Apache 2.0 license, moving away from its previous custom Gemma license. Google stated that these models possess advanced reasoning capabilities, agentic workflows, code generation, and visual and audio generation features, offered in four different versions optimized for local execution, even on "billions of Android devices."

Google commented:

"Gemma 4 is based on the same world-class research and technology as Gemini 3 and represents the most capable set of models you can run on local hardware today. They complement our Gemini models, offering developers the industry's most powerful combination of open-source and proprietary tools."

The larger 26B and 31B versions of Gemma 4 models are designed to run on consumer GPUs and can be used to power IDEs, programming assistants, and agentic workflows. The lighter E2B and E4B versions focus more on multimodal capabilities and low-latency processing, suitable for mobile and IoT devices (including Raspberry Pi). These models also support fully offline operation.

Google's Gemma 4 open-source models are available for download on multiple platforms, including Hugging Face, Kaggle, and Ollama. Google emphasized:

"These models adhere to the same stringent safety protocols for infrastructure security as our proprietary models."

More news, continuously updated

Risk Disclosure and Disclaimer

Markets involve risks; investment requires caution. This article does not constitute personal investment advice, nor has it considered individual users' specific investment objectives, financial situation, or needs. Users should consider whether any opinion, view, or conclusion in this article is appropriate for their specific circumstances. Investment based on this is at your own risk.