Tech investment tycoon: Next year NVIDIA GPUs will overturn Google's TPU advantage

Wallstreetcn
2025.12.10 03:05
portai
I'm PortAI, I can summarize articles.

Investor Gavin Baker pointed out that Google's TPU currently leads NVIDIA's fourth-generation Hopper chip in AI training costs, allowing it to operate with a negative profit margin of 30% to suppress competitors. However, the situation will reverse once the Blackwell chip cluster is put into use in early 2026. Additionally, Google's conservative design and supply chain strategy in TPU development may limit its long-term competitiveness. Once it loses its cost advantage, Google's sustained loss-making operations will become unsustainable, reshaping the competitive landscape of the AI industry

NVIDIA's next-generation Blackwell chips and their subsequent products will reshape the cost structure of AI training next year, potentially ending Google's cost advantage with TPU.

On December 9th, tech investment mogul Gavin Baker stated in a podcast interview that Google has maintained a low-cost advantage in AI training with its TPU chips.

Baker pointed out that in the semiconductor era, Google's TPU chips are akin to having "fourth-generation jet fighters," while NVIDIA's Hopper chips are still at the level of "World War II's P-51 Mustang." This cost advantage allows Google to operate its AI business at a negative 30% profit margin, effectively "sucking the economic oxygen out of the AI ecosystem."

However, Baker emphasized that as NVIDIA's Blackwell chip cluster begins training in early 2026, along with the subsequent launch of the more easily deployable GB300 chip, this situation is about to reverse. Once Google loses its cost advantage, it could reshape the competitive landscape and economic model of the AI industry.

Blackwell's Complex Transformation Creates a Window of Opportunity for Google

The delayed deployment of Blackwell has created an unexpected window of advantage for Google.

Baker believes that the transition from Hopper to Blackwell is one of the most complex product transformations in tech history: the weight of data center racks has increased from about 1,000 pounds to 3,000 pounds, power consumption has jumped from 30 kilowatts to 130 kilowatts, and the cooling method has shifted from air cooling to liquid cooling. Baker vividly compares it to:

It's like to use a new iPhone, you have to change all the outlets in your home to 220 volts, install a Tesla battery wall, a backup generator, solar panels, and a whole-house humidification system, and reinforce the floor.

Because of these technical challenges, the Blackwell chips have only recently begun large-scale deployment in the last three to four months.

If it weren't for breakthroughs in inference technology, AI progress would have completely stalled during the mid-2024 period leading up to the Gemini 3 release. Inference technology effectively "saved AI," filling an approximately 18-month gap before the arrival of the new generation of chips.

Baker expects the first models trained on Blackwell to debut in early 2026, likely led by xAI.

Baker emphasized that xAI plays a key role for NVIDIA. Their rapid deployment speed allows NVIDIA to deploy as many GPUs as possible in a data center to form a coherent cluster, thereby troubleshooting for all customers. This "coherent" means that each GPU knows the status of other GPUs, sharing memory through scale-up networks and scale-out connections.

More critically, NVIDIA's upcoming GB 300 chip will have "plug-and-play" compatibility, allowing it to directly replace existing GB 200 racks without additional infrastructure modifications, making vertically integrated companies the new low-cost producers

TPU Architecture Decision Limits Future Competitiveness

Google's conservative design choices and supply chain strategy in TPU development may limit its long-term competitiveness.

Gavin Baker points out that Google keeps the front-end design of TPU in-house but outsources the back-end design to Broadcom, which charges a gross margin of 50-55%.

With an estimated TPU business scale of about $30 billion by 2027, Google will have to pay Broadcom about $15 billion each year. Baker notes that considering Broadcom's semiconductor division's operating costs are only about $5 billion, Google has every economic reason to bring the entire semiconductor project in-house.

Baker states that Apple adopts this model, not relying on ASIC partners and completing both front-end and back-end designs in-house, thereby avoiding paying 50% of profits.

Baker believes Google has already begun to take action, with the introduction of MediaTek serving as a "warning" to Broadcom. This Taiwanese ASIC company's gross margin is significantly lower than that of Broadcom.

However, this differentiation among suppliers has also led Google to be more conservative in design, making it difficult for TPU's development speed to keep up with the annual iteration pace of NVIDIA GPUs.

In contrast, NVIDIA and AMD's strategy is to "release a GPU every year, making it impossible for competitors to keep up." By introducing MediaTek as a second supplier, Google is effectively sending a warning signal to Broadcom, but this dispersed supply may further slow down TPU's evolutionary speed.

Strategic Calculations Will Undergo Fundamental Changes

Once Google loses its status as the lowest-cost producer, its strategic calculations will undergo fundamental changes.

As a low-cost producer, operating the AI business at a negative profit margin to suppress competitors is economically entirely reasonable—this can weaken competitors that require external financing, ultimately gaining dominant market share.

However, when the Blackwell cluster shifts to inference applications and the cost dynamics change, maintaining a negative 30% profit margin will become "very painful" for Google, potentially affecting its stock performance. This will have profound implications for the economic landscape of the entire AI industry.

Baker emphasizes that after the launch of the next-generation Ruben chip, the gap between NVIDIA GPUs and TPUs and other ASICs will further widen.