
The ultimate test scores have reached a new high, Google Gemini 3 has undergone a major upgrade in its deep thinking model, targeting scientific research and engineering applications

Without the aid of tools, the model achieved a 48.4% accuracy rate on the "Human's Last Exam" (HLE) benchmark test and scored 84.6% on the ARC-AGI-2 test; the written portions of the 2025 International Physics Olympiad and Chemistry Olympiad both reached gold medal level. Google stated that the new model is driving discoveries and helping researchers solve "intractable" problems—from identifying flaws in research papers to optimizing semiconductor crystal growth
Google's deep thinking model Gemini 3 Deep Think has undergone a significant upgrade, advancing its professional reasoning capabilities from abstract theory to practical application scenarios. This upgrade focuses on addressing complex challenges in modern scientific research and engineering, marking Google's strategic bet in the enterprise AI market.
On Thursday, December 12th, Eastern Time, Google officially announced the upgrade of Gemini 3 Deep Think, stating that the upgraded model achieved breakthrough results in multiple industry benchmark tests, including a score of 84.6% in the "Humanity's Last Exam" (HLE) benchmark test, verified by the ARC Prize Foundation; on the competitive programming platform Codeforces, Gemini 3 Deep Think received an Elo rating of 3455.

The upgraded deep thinking model is now available to Google AI Ultra subscribers, while early access is provided to select researchers, engineers, and enterprise users through the Gemini API. Google stated that the model has demonstrated its application value in real research, from identifying logical flaws in research papers to optimizing semiconductor material growth processes.
This release positions Google in direct competition with OpenAI's o1 series and Anthropic's Claude in the AI reasoning model race. As general AI capabilities become increasingly commoditized, professional reasoning abilities have emerged as a new battleground in the enterprise market, and the launch of the deep thinking model shows that Google is unwilling to concede in this high-value field.
From Benchmark Tests to Gold Medal Performance
Google emphasized the performance of the deep thinking model in rigorous academic benchmark tests on its official blog. In addition to the aforementioned results, the Gemini 3 deep thinking model achieved gold medal levels in the written portions of the 2025 International Physics Olympiad and Chemistry Olympiad, and scored 50.5% in the CMT-Benchmark advanced theoretical physics test.
The score comparisons provided by Google show that this month, the Gemini 3 deep thinking model's various test results surpassed the strongest models from Anthropic and OpenAI, and also outperformed the thinking mode of the Gemini 3 Pro preview version.
For example, in the ARC-AGI-2 test, Gemini 3 deep thinking achieved an accuracy rate of 84.6%, while Anthropic's Claude Opus 4.6 Thinking Max scored 68.8%, and OpenAI's GPT-5.2 Thinking xhigh scored 52.9%.
The Google team stated that this upgrade was completed in close collaboration with scientists and researchers, aiming to address research challenges characterized by "lack of clear boundaries or a single correct answer, and data that is often messy or incomplete." The model achieves a leap from abstract theory to practical application by combining deep scientific knowledge with practical engineering capabilities.
Beyond breakthroughs in mathematics and programming skills, the performance range of the deep thinking model has expanded to multiple scientific fields, including chemistry and physics (including theoretical physics). This breadth means that the model is no longer limited to specific disciplines but has become a cross-disciplinary research tool.
Practical Application Cases Validate Value
Early testing user scenarios demonstrated the practical application potential of the model. Lisa Carbone, a mathematician at Rutgers University, utilized the deep thinking model to review a highly specialized mathematical paper while researching the mathematical structures required for high-energy physics. The model successfully identified a subtle logical flaw that had previously gone unnoticed despite human peer review.
At Duke University, the Wang lab used the deep thinking model to optimize manufacturing methods for complex crystal growth, aimed at discovering potential semiconductor materials. The model successfully designed a formula that grew films exceeding 100 microns, achieving precision targets that previous methods struggled to meet.
Anupam Pathak, head of R&D for Google's Platforms and Devices division and former CEO of Liftware, tested the new version of the deep thinking model to accelerate the design of physical components.
Another application scenario showcased by Google demonstrated that with the upgraded Gemini 3 Deep Think, users can convert sketches into 3D printable physical models. The model can analyze drawings, model complex shapes, and generate files for 3D printing.
Strategic Layout in the Enterprise Market
This upgrade reflects a trend in the AI industry—shifting from general chatbots to specialized reasoning engines capable of handling professional-level problems. For enterprise clients, evaluation criteria are changing; the focus is no longer solely on which AI can write code or summarize documents the fastest, but rather on reasoning capabilities—whether the model can handle complex financial models, analyze experimental data and identify methodological flaws, assist in patent research, or drug discovery.
Google's advantage lies in its integration capabilities. The deep thinking model does not exist in isolation but is part of the broader Gemini ecosystem, meaning it can leverage Google's vast knowledge graph, scientific datasets, and research partnerships. Researchers using the deep thinking model through Google Cloud theoretically have access to computational power and data sources that independent AI services cannot match.
The company stated on Thursday on the X platform: "The upgraded deep thinking model is already driving discoveries and helping researchers solve 'intractable' problems—from identifying flaws in research papers to optimizing semiconductor (crystal) growth." This statement emphasizes the model's ability to transition from testing benchmarks to practical applications.
From a product strategy perspective, Google is simultaneously opening access to both consumer and enterprise users. Google AI Ultra subscribers can immediately use the Gemini application, while scientists, engineers, and enterprise users can apply for access to the Gemini API through an early access program This layered strategy reflects Google's dual goals of maintaining a presence in the consumer market while competing for high-value enterprise clients.
The Competition in Reasoning Models Heats Up
The launch of the deep thinking mode has positioned Google in direct competition with OpenAI and Anthropic in the AI reasoning race. OpenAI's o1 model reportedly spends more time "thinking" before generating responses, using reinforcement learning to improve the reasoning chain. Anthropic's Claude 3 has carved out a niche in research and analysis tasks. Now, Google is staking its claim in the same field, backed by the infrastructure and distribution advantages brought by integration into Workspace and Cloud Platform.
For professional users, this means making a choice between fast general responses and slower deep reasoning, becoming a new architectural decision. Applications may route simple queries to standard models while escalating complex issues to the reasoning mode, creating a layered AI reasoning approach.
On Thursday, Google posted on the X platform: "Gemini 3's deep thinking mode has excelled in benchmarks pushing the frontiers of intelligence. Specific data: achieving 48.4% in the 'Last Exam for Humans' (without tools), reaching 84.6% in ARC-AGI-2 (validated by the ARC Prize Foundation), and obtaining a 3455 Elo rating in Codeforces competitive programming."
Google also noted that the model now performs excellently in scientific fields such as chemistry and physics.
The true test of this competition lies not in the release of statements but in actual adoption rates. If research institutions and engineering companies begin to handle complex work through the deep thinking mode, it will validate Google's judgment—that the future of enterprise AI lies in depth rather than speed. Currently, the company has made it clear: it is competing for the high-end sector of the AI market, where thinking is more important than dialogue
