Amazon deploys Cerebras chips, focusing on its "ultra-fast inference solutions"

Wallstreetcn
2026.03.14 03:52
portai
I'm PortAI, I can summarize articles.

Amazon Web Services has reached a multi-year collaboration with chip startup Cerebras to jointly deploy Cerebras and its self-developed Trainium chips in data centers, providing high-speed AI inference services. Cerebras chips are claimed to be 25 times faster than NVIDIA GPUs, and this collaboration will help it reach a large number of cloud customers

Amazon Web Services and chip startup Cerebras announced a multi-year partnership agreement to deploy Cerebras chips in its data centers for AI inference computing.

On Friday, March 13, according to the agreement announced by both parties, Amazon Web Services will combine Cerebras chips with its self-developed Trainium chips to provide faster inference computing services.

This marks another endorsement for the startup following the over $10 billion partnership agreement signed between OpenAI and Cerebras in January this year.

Cerebras touts its chips as "ultra-fast inference solutions," stating that they can handle complex tasks known as "decoding," which is the inference computing phase where AI models respond to user queries, with speeds 25 times faster than NVIDIA's GPUs.

This collaboration is significant for Cerebras's business landscape. Cerebras CEO Andrew Feldman stated:

More and more people are using artificial intelligence, with increasing frequency, and using it to solve more complex problems. This allows the Cerebras-Trainium solution to connect to the largest cloud platforms, giving us the opportunity to reach a large number of customers.

The Rise of the Inference Market, Pressure on GPU Dominance

The focus of computing power demand in the AI industry is quietly shifting.

As the user base of AI tools and agents rapidly expands, the demand for computing power during the training phase is nearing saturation, and the importance of inference computing is becoming increasingly prominent. Enterprises generally recognize that while GPUs perform excellently in model training, they are not the optimal choice for inference workloads that require extreme response speeds, prompting various parties to accelerate the diversification of supplier layouts.

As the world's largest cloud service provider, AWS has previously relied on its Annapurna Labs semiconductor business to design Trainium chips to provide computing power support for data centers.

The introduction of Cerebras chips aims to address the limitations of Trainium in high-speed inference scenarios and to offer a tiered pricing scheme for inference products, with slower pure Trainium services offered at a lower price, while the combination of Cerebras and Trainium is positioned as high-end.

AWS co-founder and vice president Nafea Bshara stated that the company's goal is to "continuously advance speed improvements and reduce prices." Feldman bluntly stated:

If you want fast token output, if speed is critical to you, if you are working with code or agents, we are not only the absolute fastest, but we also aim to set the industry standard.

Increasing Pressure on NVIDIA, Expansion of Custom Chip Forces

This deal is a reflection of the intensifying competition faced by NVIDIA.

Custom processor designers are encroaching on NVIDIA's market share by breaking through in specialized scenarios, and customer demands for faster speeds and lower costs are also forcing chip giants to accelerate product iterations According to reports, NVIDIA signed a $20 billion licensing agreement with chip startup Groq last December and plans to release a new processing system using Groq technology specifically for inference scenarios in the near future.

For Cerebras, the collaboration with AWS occurs at a critical juncture in its rapid business expansion.

In February of this year, Cerebras announced the completion of a new funding round of $1 billion, bringing its total funding to $2.6 billion, with a post-money valuation of approximately $23 billion.

In January of this year, ChatGPT developer OpenAI signed an agreement worth over $10 billion to deploy Cerebras chips to provide computing power support for its flagship chatbot, with OpenAI planning to use Cerebras chips to deploy up to 750 megawatts of computing power.

This startup has garnered support from leading institutional investors such as Fidelity Management, Atreides Management, Benchmark, Tiger Global, and Coatue, but previously faced financing difficulties.

Cerebras submitted an IPO application in September 2024 but withdrew its listing documents about a year later. There is currently no conclusion on when the listing plan will be restarted