Cursor's self-developed new model surpasses Opus 4.6, featuring "price at 10% off," netizens mockingly call it "Kimi 2.5 shell," certified by Musk

Wallstreetcn
2026.03.21 05:35

Cursor released a new model, Composer 2, claiming performance surpassing Opus 4.6 with a significant price drop, but was quickly exposed to be based on the Kimi K2.5 from Moonlight, leading to ridicule from Musk and netizens for "shelling." The founder of Cursor later apologized, admitting the base was not clearly indicated, but emphasized that extensive reinforcement learning had been conducted; Moonlight responded by confirming a compliant authorized collaboration and expressed congratulations

AI programming tool Cursor has officially released its self-developed model Composer 2, claiming performance surpassing Claude Opus 4.6 while significantly lowering prices. However, within less than 3 hours, developers exposed that its underlying foundation is actually the open-source model Kimi K2.5 from China's Dark Side of the Moon.

This "self-developed" controversy quickly swept through the AI community, with Elon Musk personally validating the situation, ultimately leading to an apology from Cursor's co-founder and a congratulatory message from Kimi's official account.

On March 21, according to Hard AI news, Cursor co-founder Aman Sanger acknowledged in a post after the incident escalated, "It was our oversight not to mention the Kimi foundational model in the blog from the beginning, and we will correct this in the next model."

The official account of Dark Side of the Moon promptly responded: "Congratulations to Cursor on the launch of Composer 2. We are proud to see Kimi K2.5 as the foundational model; this is what we love about the open-source ecosystem." Dark Side of the Moon also clarified that Cursor accesses Kimi K2.5 through the Fireworks AI-hosted reinforcement learning and reasoning platform, which is part of an authorized commercial collaboration.

Performance surpassing Opus 4.6, price "ankle-cut"

Cursor officially launched Composer 2 this Friday, claiming in its release blog that the model has achieved significant improvements across all benchmark tests it measured, including Terminal-Bench 2.0 and SWE-bench Multilingual.

In the Terminal-Bench 2.0, which measures the operational capabilities of intelligent agents, Composer 2's performance is positioned between GPT-5.4 and Claude Opus 4.6, while its cost-performance ratio on the CursorBench benchmark clearly exceeds the aforementioned two models.

Pricing is the core selling point of Cursor's release this time. The input price for the standard version of Composer 2 is $0.5 per million tokens, and the output price is $2.5 per million tokens, representing a nearly "ankle-cut" level reduction compared to Claude Opus 4.6.

Cursor also launched a faster variant, Composer 2 Fast, priced at $1.5 per million input tokens and $7.5 per million output tokens, maintaining its price advantage while emphasizing response speed.

Cursor attributes this breakthrough in cost-performance ratio to a new reinforcement learning method and emphasizes that this is "a genuinely trained capability, not a reasoning trick."

Exposed within less than 3 hours, underlying foundation revealed

However, the highlight moment of Composer 2 was extremely short-lived. Less than 3 hours after its release, X platform user @fynnso discovered that the model ID displayed as kimi-k2p5-rl-0317-s515-fast, and concluded, "Composer 2 is actually just Kimi K2.5 enhanced through reinforcement learning."

This discovery quickly spread across technical communities such as X and Hacker News, with memes and discussions flourishing. Musk also directly replied to @fynnso's post with "Yeah, it's Kimi 2.5," further amplifying the topic's popularity.

Discussions in the Reddit community r/singularity were equally heated. One user commented:

"The funniest part is that everyone is praising Composer 2 as a huge leap, but it turns out they were using someone else's model the whole time. It makes you wonder how many so-called 'proprietary models' are actually just open-source fine-tuned versions with a logo slapped on."

There are also views that the true moat of Cursor lies in the task-solving data accumulated from a large number of developers using it, rather than the pre-training itself, "Every investor knows they are not building their own foundational model; they should have been honest about it from the start."

Cursor Apologizes, Kimi Confirms Authorized Collaboration

In the face of public pressure, the Cursor team made a positive response.

Aman Sanger publicly confirmed that the team conducted perplexity evaluations on multiple foundational models, and Kimi K2.5 "proved to be the strongest," subsequently layering continuous pre-training and 4 times the scale of high-computational reinforcement learning on top of it, and deployed it through Fireworks AI's inference and RL sampler.

Lee Robinson, Vice President of Developer Education at Cursor, further disclosed more technical details: In the final model, the computational power from the foundational model accounts for about 1/4, while the remaining 3/4 comes from Cursor's own training.

Robinson also stated that although Composer 2 was developed based on open-source models, the team would also conduct complete pre-training in the future

The official statement from the Dark Side of the Moon emphasized that this collaboration complies with licensing requirements and constitutes an authorized commercial partnership, congratulating Cursor on the release of Composer 2.

At this point, the legal and licensing aspects of this controversy have been largely clarified, but Cursor's deliberate avoidance of base information during the release still leaves ripples in the developer community.

"Note-taking" Reinforcement Learning: Cursor's Technical Narrative

Despite the controversy over the source of the base, Cursor's work on the technical front still holds independent value.

Cursor detailed its core method in a blog post—a reinforcement learning mechanism called "self-summary," aimed at addressing the pain point of AI programming assistants "going off track" due to limited context windows when handling extremely long and complex tasks.

Specifically, during the execution of tasks, the model actively pauses when reaching a fixed token length trigger point, generating a "phase summary," and then continues to advance the task based on the compressed context. This summarization capability is incorporated into the reinforcement learning reward mechanism: the higher the quality of the summary and the higher the success rate of subsequent tasks, the greater the reward the model receives; conversely, it faces penalties.

Internal testing data disclosed by Cursor shows that compared to traditional summarization methods, this method uses only 1/5 of the tokens of traditional methods, while the errors reduced by compression are about 50%.

Using the challenging task of "running the Doom game on MIPS architecture" as an example, Composer found the precise solution after 170 rounds of interaction, compressing over 100,000 tokens of context to about 1,000.

Open Source Ecosystem and Transparency Debate

The deeper discussion of this incident points to the mutual trust issues between the AI application layer and the open-source ecosystem.

Clement Delangue, co-founder and CEO of Hugging Face, sees the value of open source in this context, stating that China's open-source models have now become the greatest force shaping the global AI technology stack.

Competitor Windsurf quickly seized the opportunity, announcing that it will offer Kimi K2.5 free to users for the next week, leveraging the situation to attract Cursor users. Analysis indicates that this turmoil has brought additional public opinion pressure on Cursor at a critical financing juncture. According to reports, Cursor is currently seeking a new round of financing with a valuation of $50 billion.

Cursor CEO Aman Sanger previously stated that Cursor is a new type of company that is "neither a pure application developer nor a model provider."

This incident demonstrates that as the performance of open-source foundations gradually approaches that of top proprietary models, how downstream application vendors achieve a balance between commercial packaging and technical transparency will become an unavoidable topic in the industry