NVIDIA is being cornered: OpenAI releases Cerebras chip-supported model for the first time

OpenAI released GPT-5.3-Codex-Spark on Thursday, designed specifically for real-time coding. It is a streamlined version of OpenAI's latest code automation software, Codex, and the first result following OpenAI's agreement with Cerebras for over $10 billion last month. An OpenAI spokesperson stated that the partnership with NVIDIA is "foundational" and will continue to evaluate the most cost-effective chips across all use cases, with GPUs remaining the preferred choice for applications such as research and inference

OpenAI is reducing its reliance on NVIDIA and has released its first AI model running on Cerebras Systems chips this Thursday, marking a key step in the AI star's strategy to diversify its suppliers. This move comes as OpenAI's relationship with NVIDIA is delicate, with reports suggesting that the $100 billion partnership announced last fall is now stalled.

GPT-5.3-Codex-Spark is designed for real-time coding and is a streamlined version of OpenAI's latest code automation software, Codex, aimed at providing faster response times in exchange for some performance. OpenAI claims that this model generates outputs 15 times faster than its predecessors, producing over 1,000 tokens per second. This is the first result following the over $10 billion agreement signed between OpenAI and Cerebras in January 2026.

The model is initially available to ChatGPT Pro subscribers as a research preview and is offered through the Codex application, command line interface, and Visual Studio Code extension. OpenAI states that Codex currently has over 1 million weekly active users, with downloads exceeding 1 million in the past ten days.

This release highlights the competitive pressure OpenAI faces in the AI coding assistant market. The company is contending with fierce competition from rivals like Google and Anthropic, while also dealing with controversies such as the dissolution of its internal security team, departures of researchers, and the introduction of ads in ChatGPT.

Speed Improvement Accompanied by Performance Trade-offs

Codex-Spark represents OpenAI's first model specifically designed for real-time coding collaboration. The company claims a 15-fold increase in generation speed but has declined to provide specific latency metrics, such as first token time or tokens per second.

"We cannot share specific latency numbers, but Codex-Spark has been optimized to provide an almost instantaneous feel—achieving 15 times faster generation speed while maintaining a high capability for real coding tasks," an OpenAI spokesperson stated.

This speed improvement comes at the cost of capability trade-offs. In the SWE-Bench Pro and Terminal-Bench 2.0, two industry benchmark tests assessing AI systems' ability to autonomously execute complex software engineering tasks, Codex-Spark's performance lags behind the full GPT-5.3-Codex model. OpenAI positions this as an acceptable trade-off: developers can receive fast enough responses to maintain creative flow, even if the underlying model cannot handle the most complex multi-step programming challenges.

The model features a context window of 128,000 tokens and supports only text input, with no support for image or multimodal input. A small number of enterprise partners will gain API access to evaluate integration possibilities. OpenAI plans to expand access in the coming weeks based on actual workloads.

Cerebras Hardware Eliminates Traditional GPU Cluster Bottlenecks

The technical architecture behind Codex-Spark reflects the increasing importance of inference economics as AI companies scale consumer-facing products. Cerebras' third-generation wafer-scale engine is a chip about the size of a dinner plate, containing 40 trillion transistors, eliminating the significant communication overhead that arises when AI workloads are distributed across multiple small processor clusters For training large-scale models, distributed methods are still necessary, and NVIDIA GPUs excel in this regard. However, for inference—the process of generating responses to user queries—Cerebras believes its architecture can deliver results with significantly lower latency. Sean Lie, Chief Technology Officer and co-founder of Cerebras, views this collaboration as an opportunity to reshape how developers interact with AI systems.

"What excites us most about GPT-5.3-Codex-Spark is the collaboration with OpenAI and the developer community to explore the possibilities brought by fast inference—new interaction modes, new use cases, and fundamentally different model experiences," Lie stated in a press release. "This preview is just the beginning."

OpenAI's infrastructure team has not limited its optimization efforts to Cerebras hardware. The company announced latency improvements across the entire inference stack, benefiting all Codex models, including persistent WebSocket connections and optimizations within the Responses API. Results show an 80% reduction in client-server round-trip overhead, a 30% reduction in per-token costs, and a 50% reduction in first-token time.

Collaboration with NVIDIA's $100 billion deal seems to be stagnating

Given the increasingly complex relationship between OpenAI and NVIDIA, the significance of the Cerebras collaboration is substantial.

In September 2025, NVIDIA and OpenAI announced a letter of intent to establish a strategic partnership, where OpenAI would utilize NVIDIA's systems to build and deploy at least 10 gigawatts (GW) of AI data centers, using millions of NVIDIA graphics processing units (GPUs) to train and deploy OpenAI's next-generation AI models, while NVIDIA planned to invest up to $100 billion in OpenAI. This is NVIDIA's largest investment commitment to date.

The strategic partnership announcement seemed to solidify the alliance between the world's most valuable AI company and the dominant chip supplier.

Five months later, multiple reports indicated that the massive deal has substantially stagnated. NVIDIA CEO Jensen Huang publicly denied any tension, telling reporters in late January that "there's no drama," and that NVIDIA remains committed to participating in OpenAI's current funding round. However, the relationship between the two parties has clearly cooled, with commentary suggesting friction stems from multiple factors.

OpenAI is actively seeking partnerships with alternative chip suppliers, including the deal with Cerebras and agreements with AMD and Broadcom. In October 2025, OpenAI reached a significant agreement with NVIDIA competitor AMD to deploy 6GW of AMD GPUs over several years. Later that month, OpenAI agreed to purchase custom chips and networking components from Broadcom.

From NVIDIA's perspective, OpenAI may be leveraging its influence to commoditize the hardware that makes its AI breakthroughs possible. From OpenAI's perspective, reducing reliance on a single supplier represents a prudent business strategy An OpenAI spokesperson told the media this Thursday, "We will continue to work with the ecosystem to continuously assess the most cost-effective chips for all use cases," adding, "For cost-sensitive applications such as research and inference that prioritize throughput, GPUs remain our preferred choice."

This statement reflects a cautious effort to avoid angering NVIDIA while retaining flexibility, and it underscores the need for the massive parallel processing power provided by NVIDIA GPUs to train cutting-edge AI models.

In the spokesperson's statement on Thursday, it was noted that OpenAI's partnership with NVIDIA is "foundational," and that OpenAI's most powerful AI models are the result of "years of collaboration in hardware and software engineering" between the two companies. "That’s why we consider NVIDIA to be at the core of our training and inference stack, while intentionally expanding the ecosystem around it through partnerships with Cerebras, AMD, and Broadcom."

Internal Turmoil Intensifies External Scrutiny

As Codex-Spark is released, OpenAI is grappling with a series of internal challenges that have intensified external scrutiny of the company's direction and values. Reports this week indicated that OpenAI has disbanded its mission alignment team, which was established in September 2024 to drive the company's goal of ensuring that general artificial intelligence benefits humanity. The seven members of the team have been reassigned to other roles, and the head, Joshua Achiam, has received the new title of "Chief Futurist."

Earlier this year, OpenAI disbanded another safety-focused team—the Superalignment team—which concentrated on the long-term existential risks posed by AI. The pattern of disbanding safety-oriented teams has drawn criticism from researchers who believe that OpenAI's commercial pressures are overshadowing its original nonprofit mission.

The company is also facing the repercussions of its decision to introduce advertising in ChatGPT. Researcher Zoë Hitzig resigned this week over what she described as the "slippery slope" of ad-supported AI, warning in The New York Times that the archived intimate user conversations in ChatGPT create unprecedented opportunities for manipulation. Anthropic took the opportunity to advertise during the Super Bowl with the slogan: "Ads are coming to AI. But not to Claude."

Additionally, the company has agreed to provide ChatGPT to the Pentagon through Genai.mil, a new project from the U.S. Department of Defense that requires OpenAI to allow "all lawful uses" without restrictions imposed by the company—reportedly, Anthropic rejected these terms. There are also reports that Ryan Beiermeister, OpenAI's Vice President of Product Policy, who had expressed concerns about the planned explicit content feature, was fired in January over discrimination allegations, which she denies.

Competition in the AI Coding Assistant Market Intensifies

Despite the surrounding turmoil, OpenAI's technical roadmap for Codex still shows ambitious plans. OpenAI envisions launching a coding assistant capable of seamlessly integrating rapid interactive editing with long-running autonomous tasks—an AI that can handle quick fixes while simultaneously coordinating multiple agents to tackle more complex issues in the background An OpenAI spokesperson told the media: "Over time, these models will converge—Codex allows you to maintain a tight interactive loop while delegating long-running tasks to backend sub-agents, or parallelizing tasks across multiple models when you need breadth and speed, so you don't have to pre-select a single mode."

This vision requires not only faster inference speeds but also complex task decomposition and coordination between models of different scales and capabilities. Codex-Spark establishes a low-latency foundation for the interactive part of this experience; future versions will need to provide autonomous reasoning and multi-agent coordination capabilities to realize the complete vision.

Currently, Codex-Spark operates under a separate rate limit from other OpenAI models, reflecting the limited capacity of Cerebras infrastructure during the research preview. "Because it runs on dedicated low-latency hardware, it is subject to separate rate limit controls, which may be adjusted based on demand during the research preview," the spokesperson noted. These limits are designed to be "generous," and OpenAI monitors usage patterns when deciding how to scale.

The release of Codex-Spark comes amid fierce competition in AI-driven development tools. Anthropic's Claude Cowork product triggered a sell-off in traditional software stocks last week as investors considered whether AI assistants might replace traditional enterprise applications. Microsoft, Google, and Amazon continue to invest heavily in AI coding capabilities integrated with their respective cloud platforms.

Since its launch ten days ago, OpenAI's Codex application has shown rapid adoption momentum, with weekly active users increasing by 60% week-over-week. Currently, over 325,000 developers are actively using Codex at both free and paid tiers. However, a fundamental question facing OpenAI and the broader AI industry is whether the speed improvements promised by Codex-Spark can translate into meaningful productivity gains or merely create a more pleasant experience without changing outcomes.

The Cerebras deal is a calculated bet: dedicated hardware can unlock use cases that general-purpose GPUs cannot serve economically. For a company that is simultaneously battling competitors, managing strained supplier relationships, and addressing internal dissent regarding its business direction, it serves as a reminder that standing still is not an option in the AI race. OpenAI has built a reputation for acting quickly and breaking the mold. Now it must prove it can act even faster—without undermining itself