Report: NVIDIA will launch a "new inference chip" incorporating Groq LPU design at next month's GTC conference

Wallstreetcn
2026.02.28 03:58
portai
I'm PortAI, I can summarize articles.

NVIDIA's upcoming inference chip system integrates Groq's "Language Processing Unit" (LPU) technology, utilizing an architecture that is fundamentally different from traditional GPUs. It is optimized specifically for latency and memory bandwidth bottlenecks in large model inference through broader SRAM integration and 3D stacking technology. This new product may be based on the next-generation Feynman architecture design, significantly reducing the energy consumption and costs of AI agents. OpenAI has committed to purchasing and investing $30 billion

NVIDIA plans to release a new inference chip integrated with Groq's "Language Processing Unit" (LPU) technology at next month's GTC developer conference, representing NVIDIA's acceleration towards the field of inference computing to meet the urgent demand from customers for efficient and low-cost computing solutions.

According to The Wall Street Journal, this new system, described by NVIDIA CEO Jensen Huang as "something the world has never seen," is designed specifically to accelerate query responses for AI models. The launch of this product is expected to reshape the current AI computing power market landscape, directly impacting cloud service providers and enterprise investors seeking more cost-effective alternatives.

As an important sign of preliminary market recognition of this technology, ChatGPT developer OpenAI has agreed to become one of the largest customers of this new processor and announced plans to purchase large-scale "dedicated inference capacity" from NVIDIA. This move not only solidifies NVIDIA's core customer base but also sends a clear signal to the market: the underlying infrastructure supporting autonomous AI agents is shifting from large-scale pre-training to efficient inference.

In the face of fierce competition from Google, Amazon, and numerous startups, NVIDIA is breaking away from its traditional reliance on graphics processing units (GPUs). By introducing new technological architectures and exploring pure central processing unit (CPU) deployment models, the company aims to continue consolidating its market dominance in the next phase of AI industry evolution.

Integrating LPU Design to Address Large Model Inference Bottlenecks

As the AI industry shifts from model training to actual application deployment, inference computing has become the core focus. AI inference is mainly divided into two stages: pre-fill and decode, with the decoding process of large AI models being particularly slow. To address this technical bottleneck, NVIDIA has chosen to break through physical limits through external technology integration.

According to The Wall Street Journal, NVIDIA spent $20 billion at the end of last year to acquire key technology licenses from the startup Groq and brought in an executive team, including founder Jonathan Ross, in a large-scale "core hiring" deal. The "Language Processing Unit" (LPU) designed by Groq features an architecture that is fundamentally different from traditional GPUs, demonstrating extremely high efficiency in processing inference functions.

Industry analysts believe that the upcoming product may involve a disruptive next-generation Feynman architecture. According to a previous article from Wallstreetcn, the Feynman architecture may adopt a more extensive SRAM integration scheme and even deeply integrate the LPU through 3D stacking technology, specifically optimizing for the two major inference bottlenecks of latency and memory bandwidth, thereby significantly reducing the energy consumption and costs of AI agents' operations

Expanding Pure CPU Deployment to Provide Diverse Computing Options

While introducing the LPU architecture, NVIDIA is also flexibly adjusting the way it uses its traditional processors. NVIDIA's previous standard practice was to bundle the Vera CPU with its powerful Rubin GPU in data center servers, but this configuration has proven to be too costly and inefficient in terms of energy when handling certain specific AI agent workloads.

Some large enterprise clients have found that pure CPU environments are more efficient for running specific AI tasks. In line with this trend, NVIDIA announced this month an expansion of its collaboration with Meta Platforms, conducting its first large-scale pure CPU deployment to support Meta's advertising-targeted AI agents. This collaboration is seen by the market as an early window into NVIDIA's strategic adjustment, indicating that the company is moving beyond a single GPU sales model and attempting to lock in different segments of the AI market through a diversified hardware combination.

Market Demand Shifts Gears, Competitive Landscape Continues to Escalate

The evolution of this underlying hardware design is directly driven by the explosion of demand for AI agent applications in the tech industry. Many companies building and operating AI agents have found that traditional GPUs are too expensive and not the best choice for running models in practice.

OpenAI's movements highlight this trend. In addition to committing to purchase NVIDIA's new systems to improve its rapidly growing Codex tool, OpenAI also reached a multi-billion dollar computing partnership with startup Cerebras last month. According to Cerebras CEO Andrew Feldman, its inference-focused chips outperform NVIDIA's GPUs in speed. Additionally, OpenAI has signed a significant agreement to use Amazon's Trainium chips.

Not only startups, but major cloud service providers are also accelerating their self-developed chip efforts. Widely regarded as a leader in the automated coding market, Anthropic Claude Code currently relies primarily on chips designed by Amazon AWS and Alphabet's Google Cloud, rather than NVIDIA's products. In the face of competitors' encroachment, Jensen Huang emphasized in an interview with wccftech that NVIDIA is transforming from a pure chip supplier to a builder of a complete AI ecosystem encompassing semiconductors, data centers, cloud, and applications. For investors, next month's GTC conference will be a key moment to test whether NVIDIA can maintain its 90% market share myth in the era of inference