Full Interview with AMD CTO: Surging Demand for AI Inference Chips, GPU Supply Shortage Will Surely Ease

Wallstreetcn
2024.03.02 11:22
portai
I'm PortAI, I can summarize articles.

Mark Papermaster stated that power is a key limiting factor in chip development, and improving energy efficiency is absolutely AMD's top priority.

AMD has been sprinting in the AI chip frenzy, and Wall Street is still enthusiastically dubbing it as "NVIDIA's strongest challenger." On March 1st, following a 9% surge the previous day, AMD rose over 5% again, hitting a record high in closing price. It has surged by 14.8% this week and 30.6% year-to-date.

Recently, AMD and Executive Vice President Mark Papermaster participated in the podcast "Unprecedented: AI, Machine Learning, Technology, and Startups," discussing AMD's strategy, latest GPU developments, the deployment of inference chips, chip software stacks, and their views on the supply chain. Investors should have expectations for AMD in 2024.

Key points include:

  • Compared to competitors, AMD's MI300 chip offers higher performance, lower power consumption, and less architectural space, achieving more efficient computing.
  • AMD is committed to open source as it strengthens collaboration and innovation, continuously opening up key technologies like the ROCm software stack, allowing customers to choose independently rather than being locked into closed systems.
  • AMD ensures its products are thoroughly tested and certified on mainstream deep learning frameworks, providing high-performance, stable, and easy-to-deploy solutions.
  • AMD has seen a significant demand for AI custom inference chips covering a wide range of embedded application scenarios. Therefore, as this trend develops, AMD will offer more customized computing products to meet this demand.
  • Current GPU supply is still limited, but as the supply chain gradually improves, future supply constraints will disappear.
  • Power is a critical limiting factor after chip production capacity. All major large language model operators are looking for power sources. For developers like AMD, energy efficiency should be a top priority. We will drive energy efficiency improvement in every generation of product design, which is definitely one of the highest priorities.
  • Moore's Law is slowing down, and AMD's heterogeneous computing can deploy suitable computing engines for different applications, such as configuring ultra-low-power AI accelerators in personal computers and embedded devices, using chip combinations as a whole, selecting the best technology nodes, and considering software stack design.
  • Entering the era of cloud computing, more computing loads are shifting to servers, so AI hardware companies should prioritize reducing latency in product design.
  • By 2024, AMD will complete the AI efficiency of its entire product portfolio, expecting significant deployments in the cloud, edge computing, personal computers, embedded devices, and gaming devices.

Here is the full Q&A compilation:

Q: Can you first tell us a bit about your background? You have researched various interesting things, from the iPhone and iPad to the latest generation of AMD supercomputing chips. Answer: Of course, I've been with AMD for quite some time. What's truly fascinating is the timing of my entry into this industry. As a graduate in Electrical and Computer Engineering from the University of Texas, I've always been passionate about chip design. I was fortunate to step into the era when chip design was revolutionizing the world, with everyone today using and researching this technology. CMOS had just been put into production and use. So, I joined IBM's first CMOS project and created some pioneering designs.

I had to get hands-on experience in every aspect of chip design. During my years at IBM, I held various roles driving the development of microprocessors. Initially, at IBM, there was the PC power company. This involved collaborations with Apple and Motorola, as well as the use of large computing chips in mainframes and high-risk servers.

I gained insights into various aspects of technology, including some server development work. Later, I shifted to Apple. Steve Jobs hired me to oversee the operations of the iPhone and iPod. So, I spent a few years there. It was a pivotal moment in the industry's transformation. For me, it was a great opportunity as I concluded my tenure at AMD in the fall of 2011, serving as the Chief Technology Officer, responsible for both technology and engineering, just as Moore's Law was beginning to slow down, necessitating significant innovation.

Question: Yes, I'd like to discuss this point and what we can expect in terms of computing innovation. If we're not just dreaming, more transistors on a chip can't do this alone. I believe every listener has heard of AMD, but could you briefly introduce the main markets you serve?

Answer: AMD is a company with a history of over 50 years. Initially positioned as the second supplier, it brought critical components and x86 microprocessors. Fast forward to where we are today, it's a very diverse portfolio. Ten years ago, when our CEO Lisa Su and I joined the company, the mission was to re-establish AMD's strong competitiveness.

Supercomputing has always been a focus for AMD. About a decade ago, we began revamping our CPU roadmap. We redesigned our engineering processes, one of which involved adopting a more modular design approach, developing reusable components that could be combined based on application needs.

We invested in developing a range of new high-performance CPUs while also striving to elevate GPU performance. Both types of processing units are crucial because supercomputing is about heterogeneous computing. It requires CPUs and GPUs working in harmony to tackle the most demanding tasks.

The world's most powerful supercomputers are powered by AMD's 3rd Gen EPYC 7A53 64-core processor and Instinct MI250X GPU accelerator. In February 2022, AMD's acquisition of semiconductor manufacturer Xilinx had a significant impact on the electronics industry, further expanding its portfolio. This acquisition broadened AMD's portfolio, allowing it to venture into areas such as supercomputing, cloud computing, gaming devices, and embedded systems. AMD also acquired the company Pensando, further expanding its product portfolio.

Question: AMD has achieved remarkable success in the past decade, especially in the field of artificial intelligence. Since you joined the company, you have been emphasizing the importance of artificial intelligence. Over the past decade, there have been significant changes in the application of artificial intelligence, including not only traditional Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), but also the application of new architectures such as Transformer models and diffusion models.

Can you tell us more about what initially caught your attention in the field of artificial intelligence? And as time went on, how did AMD start to focus more on this? What solutions did you come up with?

Answer: We all know that the development of artificial intelligence began long ago, with the competition starting in the open field of applications. AMD's GPUs played a crucial role in this competition, especially in improving accuracy in image recognition and natural language processing. AMD recognized the huge opportunities in the field of artificial intelligence and devised a thoughtful strategy to become a leader in this field.

Therefore, looking at AMD's situation between 2012 and 2017, most of its revenue was mainly based on personal computers (PCs) and gaming.

So, the key was to ensure that the portfolio was competitive in building system modularity. These cornerstones had to be in leadership in the field, attracting people to use high-performance applications on the AMD platform. Therefore, first and foremost, we actually had to rebuild the CPU roadmap. That's when we released the Zen processor, with a Ryzen series for personal computers, and the Epic series in the x86 server series. This marked the beginning of the company's revenue growth and the expansion of our portfolio.

Around the same time, when we saw the direction of heterogeneous computing, the concept of heterogeneous computing had already been proposed before I joined the company. Before Lisa joined the company, AMD made a significant acquisition - acquiring GPU manufacturer ATI, incorporating GPU technology into the company's product portfolio, which was why I was attracted to the company's CPU and GPU technologies.

In fact, it is the only company that integrates CPU and GPU together. For me, the industry needs competition for traditional CPU workloads such as serial and scalar, as well as the massive parallel processing power obtained from GPUs. Therefore, AMD considered combining them through a heterogeneous computing model to meet different types of computing needs. We started manufacturing joint CPU and GPU chips for personal computer applications as early as 2011, which was earlier than any other company. We call it APU (Accelerated Processing Unit). Then, for big data applications, we first started with HPC (High-Performance Computing Technology), which is used in national laboratories and in oil exploration companies. Therefore, we initially focused on large government tender projects, eventually leading to us having AMD CPU and MDGP US in the world's largest supercomputers.

This work started several years ago and it was a combined effort of hardware and software. We have been building this hardware and software capability until December 6, 2023, last year when we announced our flagship product MI300, consisting of the pure GPU MI300X and the APU architecture MI300A, both using HBM3 memory with capacities of 192GB / 128GB. It is also a variant optimized for high-performance artificial intelligence applications, capable of targeting both training and inference simultaneously.

So, it has been a long journey, and we are delighted that our sales are starting to take off.

Q: It's amazing now. I guess when you launched MI300, you received commitments from Meta and Microsoft to purchase it. You just mentioned that you are very excited about a range of applications. Can you tell us more about the applications you are most interested in or optimistic about today, as well as aspects of cloud application deployment?

A: Certainly, when considering the primary application areas of artificial intelligence, you still see significant capital expenditure to enhance the accuracy of large language models, including in the training and inference domains. These models, like ChatGPT, Bard, and other language models, allow you to ask them any questions as they attempt to absorb massive amounts of data to train the model, which is indeed the ultimate goal of artificial intelligence and general artificial intelligence.

That's where our focus lies. MI300 is designed to start achieving this goal, it is an exemplary product that can compete with industry leaders. In fact, MI300 has already done so, it is competitive in training and leading in inference, with significant performance advantages. We have created highly efficient engines for the mathematical processing required for training or inference. But we also provide more memory for more efficient computing.

Compared to competitors, MI300 offers higher performance, lower power consumption, and less rack space, achieving more efficient computing.

Q: An important aspect of competition, as you just pointed out, includes performance, such as overall performance, efficiency, as well as software platforms, and so on. How do you consider investing in optimizing mathematical libraries? How do you want developers to understand your approach? What is your guidance compared to competitors?

A: This is a very good question, competition in the chip field is multifaceted. You will see many startups entering this field, but most of the inference work is currently done on general-purpose CPUs, and for large language model applications, it is almost all done on GPUs. Due to the dominant position of GPUs in the software and developer ecosystem, AMD has begun to focus on the development of GPUs and has made achievements in both hardware and software. We are competitive in CPUs, with our market share growing rapidly as we have generation after generation of powerful CPUs.

However, it is only now that we have truly developed world-class hardware and software for GPUs. Our goal is to ensure that the deployment process of GPUs is as simple as possible, emphasizing the use of semantics for all GPUs to make coding easier, especially for those using low-level semantics. We support all major software libraries and frameworks, including PyTorch, ONNX, and TensorFlow, working closely with developers to ensure that their GPUs seamlessly integrate with various software environments, providing developers with flexible and efficient tools.

Now, with competitive and leading products, you will find it very easy to deploy when using AMD. For example, AMD collaborates closely with partners like Hugging Face to ensure that their large language models are tested on the AMD platform, ensuring performance comparable to tests on other platforms like NVIDIA.

Similarly, AMD has conducted tests on mainstream deep learning frameworks like PyTorch and has become one of the few certified products, meaning AMD is fully compatible with their products. AMD also conducts regular regression tests to ensure product stability and reliability in various scenarios. AMD actively collaborates with customers, including early adopters of their products, to gather feedback and optimize products. This helps AMD ensure that their products can be smoothly deployed and seamlessly operate in existing business environments.

In addition, AMD collaborates with some early partners to help them deploy their large language models (LLMs) in AMD's cloud and rack configurations. This collaboration means that AMD has begun working with customers and providing services to ensure that their products run smoothly in customer environments.

At AMD's December event, other partners also took the stage, demonstrating collaborations with other important partners, including some large-scale partners. This collaboration expands AMD's partnership scope and helps promote its products to a wider market. AMD also sells through many OEM applications and works directly with customers. By working directly with customers, AMD can better understand their needs and accelerate the improvement and optimization of products based on feedback.

This is a highly restricted environment, and lack of competition is detrimental to everyone. By the way, without competition, the industry will eventually stagnate, as seen in the CPU industry before we brought in competition. It really became stagnant. You only get incremental improvements. The industry knows this, and we are very grateful for the extensive partnerships we have established with numerous partners. As a token of appreciation, we will continue to provide competitive products generation after generation.

Q: Could you talk about the reasons, motivations, or values of the open-source ROCm software stack?

A: That's a great question. ROCm is AMD's open-source GPU computing software stack designed to offer a portable, high-performance GPU computing platform. For the company, open-source is a crucial matter as they highly value a culture of collaboration and openness. Open-source technology shares the technology with the entire community, which helps drive technological development and innovation. AMD has a long history of commitment to open-source, with the CPU compiler LLVM being an open-source project. In addition to the CPU compiler and GPU, we have also opened up the ROCm software stack, which is their infrastructure and plays a significant role in winning in supercomputing. The reason for choosing to support open-source is because of the belief in this open concept, emphasizing it as one of the company's principles.

Therefore, when Xi Links and AMD merged in 2002, what I did was not just deepen the commitment to open-source, but more importantly, we didn't want to lock anyone in with proprietary closed software stacks. What we aim for is the victory of the best solution, we are committed to open-source, and committed to providing choices for our customers.

We expect to win with the best solutions, but we won't trap customers in a specific choice. We will win with generation after generation of advantages.

Q: I believe one rapidly developing field currently is cloud services for artificial intelligence computing. Obviously, there are super cloud service providers like Azure from Microsoft, AWS from Amazon, and GCP from Google. But there are also other emerging players, such as BaseTen and ModalReplicate. They provide differentiated services in terms of tools, API endpoints, etc., which these super cloud service providers currently lack. Additionally, partly due to having GPU resources, and currently there is a shortage of GPU resources, which also drives their utilization. How do you see the development of this market in the next 3 to 4 years? Perhaps GPUs will become more accessible, and shortages or restrictions will no longer be an issue?

A: Indeed, this is happening. I believe the supply constraints will disappear, that's part of it. We are ramping up production and deliveries quite smoothly. But more importantly, to answer your question, I think it should be considered this way: the market is rapidly expanding at an astonishing pace. As I mentioned before, most applications today start from these large-scale language models, which are primarily cloud-based, and not just cloud-based, but based on super large-scale clouds because it requires a massive cluster, not only for training but actually for inference of many types of generative language models.

But what we are seeing now is a series of applications growing non-linearly. We are witnessing a flood where people are starting to understand how to customize their models, how to fine-tune them, how to have smaller models that don't need to answer any questions or support any applications. However, it may only apply to a specific professional field in your business area. Therefore, this diversity makes the calculation scale and the demand for how to configure clusters very rich and diverse. The market is rapidly expanding, and you need specific application-specific configurations for computing clusters. It is evolving even further, not just limited to these large-scale and super-large-scale configurations, but transitioning towards what I call the tier of data centers.

All of this stems from when you consider those truly customized applications that can run on edge devices, achieve very low latency directly in your factory workshop, place language models at the source of data creation, and directly target end-user devices.

We have integrated our AI inference accelerators into our personal computers and will continue shipping throughout 2023. In fact, this year SES has already announced our next generation AI-accelerated personal computer. As our Xilinx product portfolio extends to embedded devices, we have seen a lot of demand from the industry for customized inference applications covering a wide range of embedded application scenarios. Therefore, as this trend develops, we will see more customized computing installations to meet the growing demand.

Q: Makes sense. A large or small portion of inference (AI computing tasks) will be pushed to edge computing in the future. Obviously, we will run some small models on devices, whether laptops or phones. The term "edge computing" mentioned here refers to processing data near the point of data generation, rather than sending data to data centers or the cloud for processing. This can reduce latency and improve processing speed.

At least in the short term, there may be some ongoing potential constraints for large models or large data centers. What are the main limiting factors faced by the GPU supply side, including packaging issues, TSMC's capacity, and other possible limiting factors? Some say that after addressing the current constraints, the next issue is whether data centers have enough power to run these devices. I am curious about how to consider these limiting factors and when the supply-demand situation can be more balanced?

A: Frankly, supply-demand balance is actually a challenge that any chip manufacturer must manage; you need to ensure your supply. Looking back at the pandemic period, our device demand surged, putting pressure on our supply chain as the demand for PC computers and our X86 servers increased significantly due to people working from home. Therefore, during the pandemic, we were in "emergency mode." We did well, although there were shortages of substrates, we increased more substrate manufacturing capacity.

We work closely with our major foundry supplier, TSMC, and have established a deep partnership with them for decades. If we can anticipate and understand market signals in advance, we can usually meet supply, and if there are shortages, we can usually control them well. Regarding artificial intelligence, we clearly see a significant increase in demand. The wafer fab is responding, and you must not only consider this as an issue of the wafer fab, you are absolutely correct. When it comes to packaging, both us and our GPU competitors utilize advanced packaging technologies. I'll show you. Although the camera may not display it clearly, this is our MI300. What you see is a complete set of chipsets. So these are smaller chips, with CPU functions, IO, and memory controllers. It can be the CPU version of our focus on high-performance computing.

We directly integrate our CPU chipsets into the same system. There is also all the surrounding high-bandwidth memory to supply these engines. These chips are laterally connected, and on the MI300, we also vertically connect these devices. So it's a complex supply chain, but we are very, very good at it. We are an excellent company, with 18 years of experience. Our AMD supply chain team is doing a great job, and I believe overall, the industry will surpass these supply constraints.

Now you mentioned power. I think this will ultimately be a key limiting factor. You see all the major operators looking for power sources, and for engine developers like us, we are very concerned about energy efficiency, and we will drive efficiency improvements in every generation of product we design. This is absolutely one of our top priorities.

Question: With the end of Moore's Law, the rate at which the number of transistors that can be accommodated on an integrated circuit doubles every two years is slowing down. How to continue to improve computing power through innovation has become an important topic. You have mentioned that this challenge sparked your interest in joining AMD, especially wanting to understand how AMD will invest in different innovative directions. Also, curious about 3D stacking technology, hoping to get an explanation in a simple and understandable way. This is a technology that increases integration and performance by vertically stacking chips.

Answer: Regarding 3D stacking technology, simply put, it is an advanced packaging technology that can stack multiple chips together, increasing integration and performance while saving space. As Moore's Law slows down, the ability of chip technology itself to transition from one generation to the next is reduced, meaning we can no longer rely on new semiconductor technology nodes to shrink device sizes, improve performance, reduce power consumption, and maintain the same cost.

Therefore, more innovation is needed now, requiring comprehensive design thinking, such as relying on new device conversions and new wafer node technologies.

Heterogeneous computing means bringing the right computing engines for the right applications, such as the ultra-low power AI accelerators we have in personal computers and embedded devices. This involves customizing engines for specific applications, using chip combinations to form a whole, selecting the best technology nodes, and considering software stack design. This optimization needs to start from transistor design, all the way to the integration of computing devices, while also considering the perspective of software stacks and applications. Like all engineers working at AMD, I am excited for the opportunity to do this work because we have the foundation to build these, and the culture at AMD embodies a spirit of collaboration, not needing to develop the entire system or application stack, but ensuring solution optimization through deep collaboration.

Q: How to ensure the security of chip manufacturing and the stability of the supply chain in the current global political and economic landscape?

A: These are crucial issues that we must consider. We strongly support international cooperation to address these challenges. The reliance on chip design to operate essential systems raises concerns about ensuring supply chain continuity for national security.

Therefore, we have incorporated this into our strategy and are collaborating with our partners to build it. We support the expansion of wafer fabs. You see TSMC building a fab in Arizona, and we are working with them. Samsung is constructing fabs in Texas, but our expansion is not limited to the U.S.; we are also expanding globally, with facilities in Europe and other regions in Asia.

This goes beyond just foundries; packaging is also a concern. When you place chips on a carrier, you need interconnects, and the ecosystem must have geographic diversity.

We believe it is crucial for everyone to be aware of the importance of geographic diversity. We are deeply involved in this effort. I am very pleased with the progress we have made. This is not something that happens overnight. It is different from software development. While software allows for quick ideation and product launches, expanding the supply chain requires years of preparation. The semiconductor industry has historically been built this way. It is a global industry chain that will create clusters of geographical expertise.

This is where we stand today, and in the face of a more turbulent macro environment, diversifying manufacturing capabilities becomes even more critical. This work is already underway.

Q: How do you view the development of AI hardware? AMD now powers many interesting devices and applications. What are your thoughts on the current developments, such as Vision Pro, Rabbit (an AI-centric device), HumanE focused on health, and Figure? It seems like there is a sudden explosion of new hardware devices. What trends do you think indicate the success of these products? What trends might signal failure, and how should we approach this array of new things and devices?

A: This is an excellent question. I will start from a technical perspective. As a chip designer, you should take pride in the simultaneous emergence of these different types of hardware because the computing power is increasing, the size is shrinking, and the power consumption is very low.

You can see more and more devices with incredible computing and audiovisual capabilities. Devices like Meta Quest and Vision Pro did not happen overnight. If you look at the early versions, they were too heavy, too large, and lacked sufficient computing power.


Due to the high latency between photons on the screen of the device and actual processing, wearing it and trying to watch a movie or play a game can make you feel uncomfortable.

First of all, I am proud of the technological advancements we have made as an industry. We are certainly very proud of AMD's contributions in this regard, but the broader question you raised is, how do you know what will be successful? Technology is a neighbor.

But if there's one thing I learned at Apple, it's that truly successful devices meet a need. They truly give you a capability you love. It's not just incremental. It must be something you enjoy, creating a new category. It is enabled by technology, but the product itself must truly pique your interest and give you new capabilities. Let me mention one thing. I mentioned AI enablement in PCs. I think this will almost create a new category for PCs. Because when you think about the types of applications you will be able to run, super high performance, yet low power reasoning you can run. Imagine, now if I don't speak English at all, and I'm watching this podcast. Suppose it's a live broadcast, I click on my real-time translation. I can translate it into my spoken language, with no perceivable delay. This is just one of countless new applications that will be enabled.

Yes, I think this is a very interesting period because over the years, companies like AMD have benefited from it, right?

You are also in the data center, but there are so many computing workloads moving to servers, right? The era of the cloud, the era of all these complex consumer social applications. I think in the new era, trying to create experiences and battles, like all these new app companies are fighting as a major consideration for latency, because you have networks, models are slow. You are trying to change the model, you have things you want to do again on the device. I just feel like it hasn't been a real design consideration for a while. Sir, I agree with your view. I think this is one of the next set of challenges, which is truly addressing not just enabling high performance and AI applications in the cloud, at the edge, and on these user devices.

Q: What are AMD's deployments in 2024?

A: For us, this is an important year because we have spent many years developing our hardware and software to support artificial intelligence, and we have just completed AI enablement across our entire product portfolio. So, in the cloud, at the edge, our personal computers, our embedded devices, our gaming devices, we are upgrading our gaming devices with AI, and 2024 is truly a huge deployment year for us.

So now the foundation has been laid, and the capabilities are in place. I mentioned all our partners to you. 2024 is a huge deployment year for us. I think we are often overlooked in the field of artificial intelligence, everyone knows our competitors, but we not only want to be recognized in the field of artificial intelligence, but based on results, capabilities, and the value we provide, we hope to be recognized in 2024 as a company that truly enables and popularizes the use of regenerative AI in the cloud, in large-scale LLM training and reasoning, and also in the entire computing field. This has been a year where the extension and combination of application capabilities have become active. I see what Microsoft is talking about and what they are enabling in terms of capabilities, from the cloud to the client side. It's very exciting. Many independent software vendors (ISVs) I've spoken with are doing the same thing. And frankly, Sarah, they are addressing the question you asked, how do I write my application to give you the best experience while leveraging the cloud and the devices on which you run applications, whether in your hands or on your laptop.

So this will be a transformative year, and we at AMD are very excited, feeling like we are at the center of it all.