Igniting a price war in the Chinese large-scale model market! How did the "quantitative giant" become the "PDD of the AI world"?
Accumulation of AI expertise in quantitative aspects, WanCard's computing power advantage, and the unique architecture of "hybrid experts"
Author: Zhao Ying
Source: Hard AI
Priced at 2 yuan per million output tokens, a week after the launch of the DeepSeek-V2 large model, it triggered a "price war" in the industry.
Byte reduced the price to 0.6 yuan per million output tokens, Alibaba then sharply reduced some large model prices by 97%, and Baidu also opened two Wenxin Yanyan models for free...
The "dark horse of large models" DeepSeek, founded by the well-known private equity giant MagicQuant, not only has the strongest performance among many open source models with its DeepSeek-V2, but also has the lowest price in the industry.
How did DeepSeek become the "Pinduoduo of the AI world"?
One of the earliest explorations of AI quantification
Behind DeepSeek is the support of MagicQuant. Since its establishment in 2015, MagicQuant has developed into a large asset management company with a management scale of about 60 billion yuan. It uses AI and algorithms to identify patterns or variables that may affect stock prices.
MagicQuant was initially founded by Liang Wenfeng in an apartment in Chengdu. He is a graduate of Zhejiang University's computer science major and was focused on trying to automate stock trading at the time.
By 2021, all of MagicQuant's strategies are using artificial intelligence. Cai Liyu, Managing Director of MagicQuant, has stated that artificial intelligence helps extract valuable data from massive datasets, which can be used to predict stock prices and make investment decisions.
Now, leveraging its accumulated AI knowledge and infrastructure, MagicQuant has created the MoE large model DeepSeek-V2, with experts saying that the model's strength is comparable to mainstream large models in the United States.
The launch of DeepSeek-V2 quickly attracted widespread attention in the industry. This AI model can not only answer questions, write code, and reason, but its cost is significantly lower than its competitors, requiring only about 2 yuan per million output tokens.
This price advantage has sparked a "price war" among Chinese AI large models, with Byte, Alibaba, and Baidu all lowering the prices of their AI services, highlighting the fierce competition in the Chinese AI market.
Massive computing power advantage
The outstanding model capabilities of DeepSeek are inseparable from sufficient computing power.
The company's first computing cluster, "Firefly No. 1," cost nearly 200 million yuan. MagicQuant is investing about 1 billion yuan to build the second supercomputing cluster, "Firefly No. 2," which is the size of a football field. Cai Liyu added that most of MagicQuant's profits are invested in artificial intelligence infrastructure.
According to data from the company's website, the second cluster has been completed, connecting more than 100 million NVIDIA processors and storage, providing DeepSeek with enough computing power to train large models.
According to a Guosheng Securities report, MagicQuant is one of six companies in China with over 10,000 A100 processors, which is generally considered the computing power threshold for training large models In addition, the DeepSeek model is also open source, allowing researchers to inspect its architecture and replicate it. The architecture of DeepSeek-V2 is considered very unique, adopting the concept of hybrid experts, dividing the model into smaller modules, which improves processing efficiency and accuracy.
Andrew Carr, Chief Scientist of Cartwheel, an AI animation startup based in the United States, stated that DeepSeek takes the idea of "hybrid experts" to the extreme by dividing the model into smaller blocks, each with hundreds of small experts