NVIDIA's AI chip rival is here! AMD launches MI300X, capable of running models with up to 80 billion parameters.
MI300X has a higher HBM density than Nvidia's AI chip H100, with HBM bandwidth up to 1.6 times that of H100, and can run larger models than H100. Other new products released by AMD have attracted Silicon Valley giants: Amazon Cloud uses fourth-generation EPYC processors to create instances; Microsoft Azure has launched new instances with Genoa-X CPUs; and Meta plans to use AMD's new cloud chip Bergamo.
With the release of new products, AMD has officially challenged NVIDIA's AI chip dominance.
On Tuesday, June 13th, AMD held a product launch event in the Eastern United States, with the most important product being the ADM's most advanced GPU Instinct MI300, which is used for training large models.
AMD CEO Lisa Su stated that generative AI and large language models (LLMs) require significantly increased computer power and memory. She expects that this year, the market for data center AI accelerators will reach around $30 billion, and by 2027, it will exceed $150 billion, with a compound annual growth rate of over 50%.
Su demonstrated that AMD's Instinct MI300A is the world's first accelerator for AI and high-performance computing (HPC) that is specifically designed for this purpose. It has 146 billion transistors spread across 13 small chips.
It uses the CDNA 3 GPU architecture and 24 Zen 4 CPU cores, with 128GB of HBM3 memory. Compared to the previous generation MI250, the performance of MI300 has increased eightfold, and the efficiency has increased fivefold. AMD previously stated at the launch event that the new Zen 4c core has a higher density than the standard Zen 4 core, is 35% smaller than the standard Zen 4 core, and maintains 100% software compatibility.
AMD has also launched a GPU-specific MI300, the MI300X, which is an optimized version for LLMs, with 192GB of HBM3 memory, 5.2TB/s bandwidth, and 896GB/s Infinity Fabric bandwidth. AMD has integrated 153 billion transistors into 12 small 5nm chips.
AMD claims that the HBM density provided by MI300X is 2.4 times that of NVIDIA's AI chip H100, and its HBM bandwidth is 1.6 times that of H100. This means that AMD's chip can run larger models than NVIDIA's chip.
Lisa Su stated that MI300X can support Hugging Face AI models with up to 40 billion parameters, and demonstrated the LLM writing a poem about San Francisco. This is the first time in the world that such a large model has been run on a single GPU. A single MI300X can run a model with up to 80 billion parameters.
LLM requires fewer GPUs, which directly benefits developers by saving costs.
AMD also released the AMD Instinct platform, which has eight MI300Xs and uses industry-standard OCP design, providing a total of 1.5TB of HBM3 memory.
Su Zifeng said that the MI300A version for CPUs and GPUs is now available, and the MI300X and Instinct platform with eight GPUs will be available for sampling in the third quarter of this year and officially launched in the fourth quarter.
Amazon, Microsoft, and Meta have already or will use AMD's new products
In addition to AI chips, AMD's press conference also introduced the fourth-generation EPYC (Xiaolong) processor, especially the progress in globally available cloud instances.
AMD's fourth-generation EPYC (Xiaolong) processor is 1.8 times faster than Intel's competing processor in cloud workloads and 1.9 times faster in enterprise workloads.
AMD stated that the fourth-generation EPYC (Xiaolong) processor uses new Zen 4c cores, which are 1.9 times more efficient than the Intel Xeon 8490H. Since most AI runs on CPUs, AMD has an absolute leading advantage in the CPU AI field.
Amazon announced on Tuesday that it is using AWS Nitro and fourth-generation EPYC processors to create new instances. The EC2 M7a instance of Amazon Cloud is now available in preview, and its performance is 50% higher than that of the M6a instance.
AMD will also use the EC2 M7a instance in internal work, including EDA software for chip design. AMD also announced that Oracle will launch the Genoa E5 instance in July this year. AMD's EPYC Bergamo processor, the industry's first native x86 CPU with 128 cores and 256 threads per slot, has been released. This means a typical 2U 4-node platform will have 2048 threads.
Bergamo is 2.5 times more powerful than its predecessor Milan and is now available to AMD's cloud customers.
Meta representatives stated that Meta uses EPYC processors in its infrastructure and is also open to AMD-based processor designs. Meta plans to use the cloud processor Bergamo for its infrastructure and storage platform.
AMD also launched the Genoa-X CPU, which will be available on Tuesday and will add over 1GB of 96-core L3 cache. It has four SKUs with 16 to 96 cores. Because it is compatible with the SP5 slot, it can be used with existing EPYC platforms.
Microsoft representatives and AMD demonstrated the performance of Microsoft's Azure HPC with the help of EPYC processors, which increased performance fourfold in four years.
Azure announced that the HBv4 and HX series instances with Genoa-X, as well as new HBv3 instances, are now available. Azure also claims that the highest performance can be increased by 5.7 times compared to the market benchmark.
AMD previously acquired DPU technology through the acquisition of Pensando. AMD claims that its P4 DPU architecture is the world's smartest DPU, reducing network costs in data centers and improving server manageability. AMD's Pensando SmartNICs are an essential part of this new data center architecture.
AMD also mentioned its own AI chip software, called ROCm. AMD President Victor Peng stated that AMD has made real progress in building a powerful software stack, and the ROCm software stack can be used with an open ecosystem of models, libraries, frameworks, and tools.