A full text of 10,000 words! NVIDIA Investment Exchange: The era of ChatGPT in robots is just around the corner, Blackwell is more than just chips but a system

Wallstreetcn
2024.03.20 07:52
portai
I'm PortAI, I can summarize articles.

Huang Renxun also emphasized that NVIDIA is selling the entire data center. The company's software business may be as important as the chip business in the long term, and has already built a brand new "operating system" for robots

NVIDIA's "AI nuclear bomb" Blackwell "blows up the audience" at the GTC developer conference, while founder Huang Renxun continues to "dominate the scene" at a later investment exchange meeting.

Huang Renxun pointed out that Blackwell is not just a chip, but a computer system. Blackwell greatly raises the industry standard, making it difficult for even ASIC to compete. NVIDIA has built a complete supercomputer, providing a full range of solutions from chips to systems, interconnects, NVLinks, networks, and software.

NVIDIA stated that Blackwell will be shipped later this year, but did not provide a more specific timeline. NVIDIA mentioned that they have discussed design requirements with customers, but there may be supply constraints in the early stages of the launch.

Huang Renxun predicted that the era of robots with ChatGPT is just around the corner, and NVIDIA has built a new "operating system" for robots.

Huang Renxun also emphasized that NVIDIA's software business may be as important as its chip business in the long run, aiming to address AI optimization and real-time, supercomputing issues, with huge potential in enterprise software.

Here are some highlights of Huang Renxun's views:

  • Blackwell is not just a chip at the core of the system, but a computer system. What NVIDIA has done is not just manufacturing chips, but also building a complete supercomputer, providing a full range of solutions from chips to systems, interconnects, NVLinks, networks, and software.

  • Blackwell significantly raises the industry standard, making it difficult for even ASIC to compete.

  • If we don't shift to accelerated computing, data processing costs will continue to rise. Therefore, for many companies that have realized this, such as AstraZeneca, Visa, American Express, Mastercard, and many others we collaborate with, they have reduced data processing costs by 95%, which is basically a 20-fold reduction.

  • We have developed accelerated algorithms so rapidly that the marginal cost of computation over the past decade has dropped significantly, making it possible for new software development based on generative AI.

  • The world's trillion-dollar data centers will eventually be accelerated.

  • If AI can tokenize words and speech patterns, why can't it mimic us and generalize like ChatGPT? Therefore, the moment when robots like ChatGPT seem imminent.

  • Omniverse is not just a tool, nor just an engine, it is a series of technical APIs designed to provide strong support for others' tools.

  • NVIDIA is a market creator, not a market sharer. Everything we do, when we start working on a technology, it doesn't exist. Even when we started researching 3D computer games, they didn't exist

  • NVIDIA's uniqueness lies in the fact that we believe we are the only chip company that can create its own market. Look at all the markets we are creating. We drive demand through software, which in turn promotes chip development. This model not only makes NVIDIA a technological innovator, but also a market leader.

  • If your software is accelerated, I am very certain it runs on NVIDIA. If you have accelerated software, I am very certain it runs on NVIDIA. The reason is that it may be running on NVIDIA first.

  • NVIDIA's software stack focuses on two things, one of which is algorithms that help computers run better, such as TensorRT-LLM, and the other is real-time algorithm discovery in the software we develop.

  • So we will produce NIMs on a very large scale, and I guess this will be a very big business, part of the industrial revolution.

  • Remember, we sell data centers, we just break them down. But in the end, what we sell is the entire data center.

  • Today, the global data center market size is $1 trillion... with a market of $250 billion annually... our percentage of the $250 billion market annually may be much higher than before.

  • As for the issue of NIMs, we offer two paths to help enterprise customers access AI. One is through our website and extensive solution provider network, enabling NIMs to be converted into applicable applications... Another more exciting direction is to provide businesses with collaborative tools and solutions, I think there will be a major transformation here.

  • In the next five to eight years, we will begin to see the update cycle of our own infrastructure. Nevertheless, I believe the current updates are not the best use of capital.

  • Omniverse is a platform that applies the concept of physics simulation feedback in the physical world, training AI through simulating interactions with various processes in the physical world. In short, we are leveraging the same principles and concepts to drive the widespread application of AI technology in different scenarios.

  • Particularly noteworthy is Isaac Sim, a robot training and simulation system built on the Omniverse platform, which has been very successful for anyone in the industry. We have built a new "operating system" for robots.

NVIDIA GTC Investment Exchange Meeting Full Translation

Jensen Huang:

Good morning. It's great to see everyone. How will our event proceed?

Colette Kress:

Alright. We have a room full of people today, thank you all for attending our first offline event in a long time. Jensen and I are here to answer your questions from yesterday We will interact with the audience here in a series of interactions. You just need to raise your hand, and we will pass the microphone to you. Jensen and I will be here to answer your questions from yesterday.

We believe this is a better plan for you. I know you have asked many questions, whether last night or this morning. Today, we will only have a Q&A session instead of a formal speech. It sounds like a good plan.

I will let Jensen see if he wants to add some opening remarks, as we have a brief introduction. Okay.

Huang Renxun:

Yes. Thank you. First of all, it's great to see everyone. There are many things I wanted to say yesterday - maybe I've said them before - I want to say them better. But I have to tell you, I have never spoken at a rock concert before. I don't know about you, but I have never spoken at a rock concert before. I simulated what it would be like, but when I stepped on stage, it still choked me. Anyway, I did my best.

Next, after the tour, I will do better, I am sure. I just need more practice. But there are a few things I want to tell you, and that is spatial computing. By the way, if you have the chance to see Omniverse Vision Pro, it's mind-blowing. It's impossible to understand how real it is.

Okay. So we talked about five things yesterday, and I think the first one is worth some explanation. I think the first one is, of course, this new industrial revolution, two shifts are happening.

The first one is from general computing to accelerated computing. If you only look at the extraordinary trend of general computing, it has slowed significantly over the past few years.

In fact, we know it has slowed down for about ten years, people just didn't want to deal with it for ten years, but you really have to deal with it now. You can see people extending the depreciation cycle of their data centers because of this. You can buy a whole new set of general servers, and it won't significantly increase the throughput of your entire data center.

So you might as well extend the use of existing equipment for a while. This trend will never reverse. General computing has reached its limit. We will continue to need it, and there is a lot of software running on it, but obviously we should accelerate everything we can accelerate.

Many workloads from different industries have already been accelerated, some of which are large workloads that we really want to further accelerate. But the benefits of accelerated computing are very, very clear.

Data processing is one of the areas I didn't spend time discussing yesterday, but I really want to talk about data processing. NVIDIA has a set of libraries, you have to process data before doing anything in the company. Of course, you have to ingest data, the amount of data is extraordinary, the world's zettabytes of data double every few years, but computing power does not.

So many companies are already on the wrong side of the data processing curve, and if they do not shift to accelerated computing, data processing costs will continue to rise. Therefore, for many companies that realize this, such as AstraZeneca, Visa, American Express, Mastercard, and many other companies we work with, they have reduced data processing costs by 95%, which is basically a 20-fold reduction. **

With its in-house RAPIDS software library suite, NVIDIA's current acceleration capabilities are astonishing. The inventor of Apache Spark, Ion Stoica, founded a great company called Databricks, which is a cloud-scale data processing company. These companies have announced that they will adopt Databricks' Photon Engine, which is the "crown jewel" of their company, and they will accelerate it using NVIDIA GPUs.

So the benefits of acceleration, of course, can save costs for customers, but more importantly, this allows you to sustain computation. Otherwise, you are on the wrong side of the curve and will never be on the right side of the curve.

The need to develop accelerated computing, is it today or tomorrow?

We have developed accelerated algorithms so rapidly that the marginal cost of computation over the past decade has dropped significantly, making it possible for a new software development approach based on generative AI.

As you know, generative AI requires a large amount of floating-point operations, a large amount of computation. This is not a normal amount of computation, but a very large amount of computation, yet it can now be effectively done. Consumers can use this incredible service, such as ChatGPT. So, it is worth noting that accelerated computing has decreased, driving the marginal cost of computation so far down that it makes another way of doing things possible.

This new way is software written by computers using a raw material called data. You apply energy to it. There is a tool called GPU supercomputer. What comes out is the tokens we enjoy. When you interact with ChatGPT, all you get are tokens.

Now, that data center is not an ordinary data center. It is not the data center you knew in the past. The reason is, it is not shared by many people. It doesn't do many different things. It is an application that runs all day, not just to save money, its job is to make money, it is a factory.

This is no different from the last industrial revolution's exchange generator. There is no difference, the raw material coming in is water. They apply energy to it, it becomes electricity. Now the raw material is data, refined through processing, and then becomes generative AI models.

What comes out are valuable tokens. We apply this basic method—some call it inference, but it is actually token generation—to make software. This is how we produce software, produce data, interact with you, how ChatGPT interacts with you, this is how we cooperate, collaborate with you.

You can expand this idea as much as possible, from Copilots to AI agents, you can expand this idea as much as possible, but it is basically the same idea. It generates software, generates tokens, it comes from an AI generator called the GPU supercomputer. Does this make sense? So these two ideas, one is that the traditional data centers we use today should be accelerated, and they are being accelerated. They are being modernized, with more and more industries one after another. So the world's trillions of dollars worth of data centers will eventually be accelerated.

The question is, how many years will it take to complete? But because of the second dynamic, the benefits in artificial intelligence are incredible, and it will further accelerate this trend. Does this make sense?

However, the second type of data center, which I call the exchange generator or AI generator or AI factory, is something completely new. It's a new kind of thing. It's a new kind of software that generates a new kind of valuable resource, created by companies, industries, countries, etc. It's a new industry.

I also talked about our new platform. There are many speculations about Blackwell. Blackwell is not just a chip at the core of the system, but a computer system. What NVIDIA has done is not just manufacturing chips, but also building a complete supercomputer, providing a full range of solutions from chips to systems, to interconnects, NVLinks, networks, software.

Can you imagine how many electronic products are brought into your home, and how you will program them? If not for all the libraries created over many years to make it effective, you would be bringing assets worth hundreds of millions of dollars into your company.

And as long as it is not utilized, your money will be wasted. And the cost is incredible. So we are not just selling chips, but helping customers start up systems and put them into use, and then working with them continuously to make it - better, better, better, which is really important.

Alright. That's what NVIDIA does. We call the platform Blackwell, which has all these components associated with it, and at the end of the presentation, we show you these to let you understand the scale we have built. All of this, we then break it down. This is the very, very difficult part of what we do.

We built this vertically integrated thing, but we built it in a way that can be dismantled later, so you can buy parts of it, because maybe you want to connect it to x86. Maybe you want to connect it to the PCI-Express bus interface.

Maybe you want to connect it to a bunch of optical components, maybe you want a very large NVLink domain, maybe you want a smaller NVLink domain, maybe you want to use Arm, and so on. Is this feasible? Maybe you want to use Ethernet, Ethernet is not good for AI. No matter what anyone says, that's the fact.

The fact is the fact. Ethernet is not friendly to AI, that's reasonable. But in the future, Ethernet can become AI-friendly. That's Ultra Ethernet. In about three to four years, Ultra Ethernet will come, and it will be more friendly to AI. But before that, it's not good for AI. It's a good network, but not good for AI So we expanded Ethernet, adding something called Spectrum-X, which basically does adaptive routing, congestion control, and noise isolation.

Remember, when you have a talkative neighbor, they will consume network traffic. AI, on the other hand, doesn't care about average throughput. AI doesn't care about the average throughput of the network; the design goal of Ethernet is maximum average throughput. AI only cares about when the last student will submit their portion of the homework. It focuses on the last person. This is a fundamentally different design point. If you exclude the best and worst students, you will come up with a different architecture. Does this make sense?

Okay. Because AI has all the aggregation, as long as you look in the algorithm, transformer algorithm, expert blending algorithm, you will see all the information. All these GPUs must communicate with each other, and the last GPU to submit the answer will slow everyone down. That's how it works. So that's why the network has such a big impact.

Can the network cover everything? Yes, but will it lose 10% or even 20% of utilization? Yes. If a computer costs $10,000, 10% to 20% utilization doesn't matter. But what if the computer costs $2 billion? That's the cost of the entire network, that's the cost of building a supercomputer.

So anyway, I showed examples of all these different components, our company created a platform and all the related software, all the necessary electronics, and then we work with companies and customers to integrate them into their data centers because their security may be different, maybe their thermal management is different, maybe their management plane is different, maybe they just want to use it for one thing, AI, maybe they want to rent it out for many people to do different AIs.

The use cases are so wide-ranging. Maybe they want to build a local platform, they want to run VMware on it. Maybe someone just wants to run Kubernetes, someone wants to run Slurm. Well, I can list all different kinds of environments, it's absolutely amazing.

We considered all these factors, for quite a long time, and now we know how to serve everyone. As a result, we can build supercomputers at scale. But basically what NVIDIA is doing is building data centers. Okay. We break it down into smaller parts and sell them as components. People therefore think we are a chip company. The third thing we do is NIMs, an innovative software.

Large language models are a miracle, ChatGPT is a miracle, not only in its capabilities, being able to interact at a very high response rate, but also the team behind it is a miracle. It's a world-class computer science organization, not just an ordinary computer science organization.

The OpenAI team is working on this, they are world-class, one of the best teams in the world. Well, for every company to be able to build their own AI model, operate their own AI, deploy their own AI, run it across multiple clouds, someone has to do computer science for them Therefore, , we have decided to do this for each individual model, each individual company, each individual configuration. We have decided to create tools, toolkits, and operational platforms, and we will package large language models for the first time.

You can purchase it, come to our website, download it, and run it. All these models are free, but there are operating costs. When you deploy it in an enterprise, the operating cost is $4,500 per GPU per year.

Okay. So the cost of each use is very low, very cheap, but the benefits are very significant. We call it NIMs (NVIDIA Inference Microservices). NIMs come in many varieties, such as supporting visual recognition, speech recognition, text recognition, and facial recognition. You will have robot joints, you will have various types of NIMs.

The way to use these NIMs is to download them from our website and fine-tune them according to your needs. Just give an example.

You said, "The answer to that question is not entirely correct. It may be correct in another company, but not here. So I will give you some examples." This is exactly what we want it to be like. You show it your work product. That's a good answer.

Our system helps you plan this process, tagging all data related to AI processing, all data processing, fine-tuning, evaluating, setting boundaries, which will make your AI model more effective and targeted.

Making it more targeted is because, if you are a retail company, you want your AI not to talk about random things. So no matter what the question is, it will stay on topic. The system that sets boundaries is another AI. So, we have all these different AIs to help you customize our NIMs, and you can create various NIMs.

We provide frameworks for many of them, one of which is very important is understanding proprietary data, because every company has proprietary data. We have created a microservice called Retriever, which is state-of-the-art, it helps you embed your database, whether it's structured or unstructured images, or charts or graphs or whatever it is.

We help you extract meaning from this data. Then we get - it's called semantics, that semantics is embedded into a vector, that vector is now indexed into a new database, called a vector database, okay. Then that vector database, later you can interact with it. You say, "Hey, how many mammals do I have, for example." It goes in and says, "Hey, look at that. You have a cat, you have a dog, you have a giraffe, these are the things in your inventory, and so on."

All of this is called NeMo, we upload a standard NVIDIA infrastructure DGX Cloud to all clouds, such as, AWS has DGX Cloud, Azure has DGX Cloud, GCP and OCI also have So we collaborate with companies around the world, especially enterprise IT companies, to create these great AIs together. But when they are ready, they can run in DGX Cloud, which means we effectively bring customers into the cloud around the world.

Platform companies like us bring customers to system manufacturers and service providers, just like we bring customers to HP, Dell, IBM, Lenovo, Supermicro, CoreWeave, and so on.

If you are a platform company, you create opportunities for everyone in the ecosystem. So, DGX Cloud allows us to bring all these enterprise applications to service providers. We have a good partnership with Dell, and we announced yesterday that HP and other companies can use these NIMs in their systems.

Then I talked about the next wave of AI, which is actually about industrial AI. In terms of dollars, the largest industry in the world is heavy industry, and heavy industry has never really benefited from IT. They have not benefited from chip design and digitization.

The AI chip industry has been completely digitized, and our technological progress is amazing. We call it chip design, not chip discovery. Why do they call it drug discovery, as if tomorrow could be different from yesterday? Biology is so complex, with so much variation, and longitudinal effects are so great because, as you know, life evolves at a different pace than transistors. So causality is harder to monitor because it happens on a large scale of systems and a large scale of time. These are very complex issues.

Industrial physics is very similar. So we finally have the ability to use large language models, the same technology. If we can tokenize proteins, if we can tokenize words, tokenize speech, tokenize images, it's no different from speech, right?

We can tokenize all these different things. We can tokenize physics, and then we can understand its meaning, just like we understand the meaning of words.

If we can understand its meaning, and we can connect it to other modalities, then we can develop generative AI. So I quickly explained that 12 years ago, our company saw this on ImageNet. The real breakthrough was actually 12 years ago.

But what are we actually looking at? Everyone should find ChatGPT interesting, but what are we looking at? What we see is a computer software that can mimic humans. It mimics the output of our language by analyzing our language.

So the question is, if AI can tokenize words and speaking styles, why can't it mimic us and generalize like ChatGPT? Therefore, the moment when ChatGPT-like robots seem imminent. We hope everyone can embrace this As a result, we have developed an operating system that enables AI to practice in a world based on physical laws, which we call "Omniverse". But remember, Omniverse is not just a tool, nor is it just an engine, it is a series of technical APIs designed to provide powerful support for other tools. In this regard, I am very excited about our collaboration with Dassault. They are upgrading their 3DEXCITE product through the Omniverse API. At the same time, Microsoft is also connecting its Power BI product to it.

Rockwell has already connected Omniverse to their tools for industrial automation, and Siemens has also connected. So these are a bunch of physics-based APIs that generate images or joint movements and connect a bunch of different environments. These APIs are designed to enhance third-party tools. I am very pleased to see its popularity, especially in industrial automation.

So, these are the five things we are doing.

I'm sorry, I'm running out of time, but let me quickly move on to the next step. Look at this chart, it basically conveys a few things. At the top are the developers. NVIDIA is a market creator, not a market sharer. Everything we do, when we start working on a technology, it doesn't exist. Even when we started researching 3D computer games, they didn't exist.

So we have to create the necessary algorithms, real-time ray tracing technology that didn't exist until we created it. So all these different capabilities didn't exist before we created them. Once we created it, there were no applications to apply it to. So we have to nurture developers, collaborate with developers, integrate the technology we just created so that applications can benefit from it.

We created Omniverse from scratch, without taking market share from anyone. And now, we need developers like Dassault, Ansys, Cadence, Rockwell, Siemens to collectively advance it and make it more impactful. I am very proud to say,

In the form of cloud APIs, Omniverse is more user-friendly, whether through SDKs or APIs, we provide convenience for developers. We host Omniverse on Azure cloud, which not only creates value for customers but also brings opportunities for Azure.

So Azure is the foundation, the system provider. In the past, system providers used to be OEMs, they still are now, but system providers are at the bottom, developers are at the top. We invent technology in the middle. The technology we invent happens to be the last chip, software-first. Without developers, there is no demand for chips.

NVIDIA is first and foremost an algorithm company, we create these SDKs, they are called "domain-specific libraries". SQL (Structured Query Language) is an SDK, and NVIDIA's cuDNN (Deep Learning GPU Acceleration Library) may be the most successful domain-specific library in the world after SQL Without deep neural networks (DNN), no one else can use CUDA. Therefore, DNN (deep neural networks) was invented.

We have hundreds of domain-specific libraries, Omniverse being one example. These domain-specific libraries combined with software developers create opportunities for underlying infrastructure when applications are created and there is a demand.

So the experience is that without software innovation, new markets cannot emerge. This idea has never changed. You can make chips to make software run better, but you cannot create a new market without software. NVIDIA's uniqueness lies in the fact that we believe we are the only chip company that can create its own market, look at all the markets we are creating, we drive demand through software, which in turn promotes chip development. This model not only makes NVIDIA a technological innovator, but also a market leader.

That's why we always talk about the future. These are the things we are working on. Nothing excites me more than collaborating with the entire industry to create the computer-aided drug design industry, not the drug discovery industry, but the drug design industry. We must approach drug design as we do chip design.

So developers are at the top, our infrastructure is at the bottom. What developers want is simple things, they want to ensure your technical performance is good, but they must solve problems that they cannot solve in other ways.

But the most important thing for developers is to install the foundation, because they don't sell hardware, if no one has the hardware to run it, their software will not be used.

So what developers want is to install the foundation, this principle has not changed from the beginning, and it has not changed now. If you develop artificial intelligence software, you want to deploy it, make it available for people to use, you need to install the foundation.

Second, system companies want "killer applications." That's why the term "killer application" exists, because where there are killer applications, there is customer demand, where there is customer demand, you can sell hardware.

So, it turns out that this cycle is very difficult to start. How many accelerated computing platforms can you really build? NVIDIA can build an accelerated computing platform for generative AI, as well as drive the development of these technologies in industrial robots, quantum, 6G, weather forecasting, and other fields.

NVIDIA has built a general-purpose accelerated computing platform that covers various application areas such as fluid dynamics, particles, biology, robotics, AI, SOL, and has successfully driven the vast majority of accelerated software.

You need a sufficiently general-purpose accelerated computing platform to run different types of software, NVIDIA has spent a long time, but basically runs everything. If your software is accelerated, I am very certain, it runs on NVIDIA. If you have accelerated software, I am very certain it runs on NVIDIA. The reason is that it may have been the first to run on NVIDIA. This is the architecture of NVIDIA. Whenever I give a keynote speech, I tend to cover all areas, some new things, such as Blackwell. I talked about a lot of good things, you really have to go check out our 1000 tox.6G will happen? Of course, it's AI.

Why is MIMO neural receiver so pre-installed, why algorithms before the site. We should have site-specific MIMO, just like robot MIMO. So, reinforcement learning and trading with the environment, so 6G will of course be software-defined, of course it's AI.

Of course, we are still excellent partners in the quantum computing industry. How to operate a quantum computer? How to build the fastest computer in the world? How to motivate quantum computers? How to simulate quantum computers? What is the programming model for quantum computers?

Programming a quantum computer is far from enough, it needs to be built on the basis of classical computing. So quantum will become some kind of quantum accelerator.

So, who should do that, we have done it, so we cooperate with the entire industry in this regard. So overall, some very, very great things. I wish I could cover it all, we can have a complete keynote speech, just about all these things, but covering the entire field, that was yesterday.

Q&A Session

Colette Kress:

Alright. We have staff walking around to see if we can get some valuable questions.

Huang Renxun, that's my first question for sure. If you can give a keynote speech in 10 minutes, why didn't you spend 10 minutes yesterday? Good question.

Ben Reitzes:

I'm Ben Reitzes from Melius Research, nice to meet you.

Huang Renxun:

Thank you, Ben.

Ben Reitzes:

This is a huge stimulus for all of us. So I want to understand more about your vision for software. You are creating an industry, you have a comprehensive solution. Obviously, NVIDIA's software makes NVIDIA's chips run better.

Do you think in the long run, NVIDIA's software business can be as big as the chip business? What will it look like in 10 years, considering NVIDIA's momentum in software and AI chip industry? It seems like it will become more.

Huang Renxun:

Thank you, Ben. First of all, thank you all for coming. This is a very different type of event, you know. Most speeches are about software, they are all computer scientists, they are talking about algorithms. NVIDIA's software stack focuses on two things, one of which helps algorithms run better on computers, TensorRT-LLM. This is an extremely complex algorithm that explores computational space in a way that most compilers have never needed. TensorRT-LLM cannot even be built without a supercomputer. It is very likely that in the future, TensorRT, and future TensorRT-LLM, will have to run on supercomputers in order to optimize AI for everyone's computers, so this optimization problem is very, very complex.

Another thing is that the software we develop involves real-time algorithm discovery. For example, Navier-Stokes, however—Schrodinger's equation, perhaps expressing it in a way that involves supercomputing, accelerated computing, or real-time ray tracing is a good example. Real-time ray tracing has never been discovered. Does that make sense? Okay. So, as you know, Navier-Stokes is an extremely complex algorithm.

Being able to reconstruct it in real-time is also very complex, requiring a lot of invention. Some of our computer scientists at the company have won Oscars for solving these problems on such a large scale, and then the film companies use it to make movies. Their inventions, their algorithms, their data structures are computer science itself. Okay. So we will focus on these two layers.

And then, when you package it—in the old days, this was useful for entertainment, media entertainment, science, and so on. But today, because AI has brought this technology to the edge of applications, simulating molecules used to be something you studied in college. Now you can do it at work.

So when we now provide all these algorithms to businesses, it becomes enterprise software. Unprecedented enterprise software. We put them in NIMs, these packages. We will mass-produce these things and support them, maintain them, keep their performance up to support customers using them.

So we will produce NIMs on a very large scale, I guess, this will be a very big business, this is part of the industrial revolution. If you see, today's IT industry is like this, SAP and great companies, ServiceNow and Adobe and Autodesk and Canes, that layer, that is today's IT industry. That's not where we want to play.

We want to play in the layer above. The layer above is a bunch of AI and these algorithms, really, we are the right company to build them. So we will build some with them, we will build some ourselves, but we will package them and deploy them at an enterprise scale. Okay. So I appreciate you raising this question.

Vivek Arya:

My name is Vivek Arya, from Bank of America Securities. Thank you, Jensen. Thank you for your presentation, Colette.

So Jensen, my question may be more towards the medium-term, which is the addressable market size, because your revenue growth is so rapid. The proportion of revenue from large customers to NVIDIA's total revenue is 30%, 40%, 50%, sometimes even more, but when I see how much revenue you generate from generative AI, The difference between them is less than 10% of their sales. So how long can this gap last?

More importantly, have we reached the midpoint of how much they can spend on your product? So I think you have provided us with a trillion-dollar market in the past, which will reach 2 trillion dollars. Can you predict how big the market is? And our position on this adoption curve, based on how much it can monetize in the near to mid-term?

Huang Renxun:

Okay. I'll give you a very concise answer first, and then I'll continue to explain in detail. It depends on the size of the market and the products we sell. Remember, we sell data centers, we just break them down. But in the end, what we sell is the entire data center. Note the last picture you saw in the keynote, it reminds us of what we actually sell. We showed a bunch of chips.

But remember, we don't actually sell those chips. The chips themselves don't work, they need to be built into a system to operate. Most importantly, the system software and ecosystem architecture are very complex. So, NVIDIA builds the entire data center for AI, and we just break it down into parts. These parts fit your company. So, that's the first point. What are we selling? Where is the opportunity?

Today, the global data center market size is $1 trillion. Yes, it's a $1 trillion infrastructure, with a $250 billion market annually. We sell parts of the entire data center. Therefore, the percentage we occupy in the $250 billion market annually may be much higher than companies that simply sell chips. It could be GPU chips, CPU chips, or network chips. That opportunity hasn't changed before. But what NVIDIA manufactures is an accelerated computing platform at the scale of data centers. Okay. So the percentage we occupy in the $250 billion market annually may be much higher than before.

The second question, how sustainable is it? There are two answers. One reason you choose NVIDIA is AI. If you only manufacture TPUs, if your GPU is only used for one application, then you must rely entirely on AI. How much can you monetize from AI today?

However, if your value proposition is AI token generation, but that is based on AI training models, it is very important to reduce computing costs, accelerate computing, sustainable computing, energy-efficient computing, which is NVIDIA's core business. This is just an area where we excel so well that we have created generative AI.

Now people forget, it's a bit like our first application was computer graphics. The first application was gaming. We did so well, so passionately, that people forget we are an accelerated computing company.

They think, hey, you're a gaming company, a generation of young people grew up. Once they learned, they used RIVA 128, they went to college with GeForce, and then when they finally became adults, they thought you were a gaming company. We have done so well in accelerated computing, AI, that people think that's all we do But accelerating computing is a $1 trillion market - $250 billion annually - whether or not there is AI, there should be $250 billion for accelerating computing, just for sustainable computing, just for processing SQL, as you know, SQL is one of the world's largest computational costs.

Then there is generative AI on top of that. How sustainable do I think generative AI will be? You know my view on this issue. I think we will generate words, images, videos, proteins, chemicals, dynamic actions, manipulations. We will generate predictions, bills, material lists, and so on.

Stacy Rasgon:

Hi, Jensen, Colette. Thank you. I'm Stacy Rasgon from Bernstein Research. I would like to ask a question about the interaction between CPU and GPU. Most of the benchmarks you showed yesterday were about the Grace Blackwell system, which has two GPUs and one CPU, and the CPU-to-GPU ratio doubled compared to Grace Hopper.

You didn't talk much about benchmarks related to independent GPUs. Is this a shift? Are you looking for more CPU content in future AI servers? And how do I view the interaction between the ARM CPU you are developing and x86, it seems that your focus on x86 in the future has decreased.

Huang Renxun:

Yes, Stacy. Thank you for your question. In fact, there is no problem with either of them. I think both x86 and ARM are perfectly acceptable for data centers. The reason Grace is built this way is that ARM has the advantage that we can shape the NVIDIA system architecture around the CPU. This way, we can create something between the GPU and CPU called chip-to-chip NVLink, connecting the GPU and CPU. We can keep both sides consistent, which means that when the CPU touches a register, it invalidates the same register on the GPU side.

So, both sides can work on a variable. Today, you can't do this between x86 and peripherals, so we solved some problems that we couldn't solve. Therefore, Grace Hopper is very suitable for CAE applications, which are multi-physics. Some run on the CPU, some on the GPU. It is very suitable for different combinations of CPU and GPU.

So we can associate very large memory with each GPU or two GPUs. Therefore, for example, data processing on Grace Hopper is very, very suitable. Okay. So this is not because of the CPU itself, but because we couldn't adopt the system. Secondly, why did I show a chart, in that chart, I showed the comparison between Hopper and Blackwell on x86 systems B100, B200, and GB200, which is Grace Blackwell In that case, the advantage of Blackwell is not because the CPU is better. It is because with the support of Grace Blackwell, we can create a larger NVLink domain. This larger NVLink domain is really important for the next generation of AI. In the future three years, three to five years, as far as we can see. If you really want good inference performance, you will need NVLink. That is the message I am trying to convey. We will talk more about this issue.

It is now very clear that these large language models will never fit on a single GPU. Okay. Anyway, this is not the point. In order to be responsive enough and have high throughput to keep costs down, you need many more GPUs than you even fit. To have many GPUs working together without overhead, you need NVLink. The benefits of NVLinks are for inference, while some people always think the benefits of NVLinks are for training.

The benefits of NVLinks and inference are beyond charts. That is a difference between 5 times and 30 times, that is another 6 times, all NVLink. NVLinks in the new Tensor Core. Yes, okay. So Grace allows us to build a system just as we need it, and it's harder to do with x86. That's all. But we support both. We will have two versions of both.

And in the case of B100, it just slides into the position where H100 and H200 are. So the transition from Hopper to Blackwell is immediate. Once it's available, you just slide it in, and then you can figure out what the next data center needs to do. Okay. So we get the benefits of high performance at the architectural limit, as well as the benefits of easy transition.

Matt Ramsay:

Hello everyone. I'm Matt Ramsay from TD Cowen.

Jensen, Colette. Thank you, good morning, thank you for your participation. I would like Jensen to comment on a few topics I have been thinking about recently. One of them is NIMs that you discussed yesterday, I think it is an accelerator for specific vertical fields, which can help customers enter the AI ecosystem faster. Can you briefly introduce how your company is taking action in the broad enterprise market and how customers can join AI?

The second question is about power. Our team has put a lot of effort into this recently. I am considering whether we need to increase investment in this area. Some systems mentioned yesterday consume as much as 100 kilowatts or more, and the implementation of this computing scale depends on your integration work. At the same time, we are also concerned about macro-level power generation and power supply in high-density environments. I would like to hear how your company collaborates with the industry to supply the power needed for these systems Huang Renxun:

Alright, let me start by addressing the second question. Electricity supply, obviously, 100 kilowatts is a significant amount of electricity for a computer system, but electricity itself is a commodity, as you all know, right? The world needs far more electricity than just 120 kilowatts.

Therefore, the absolute amount of electricity is not the issue, nor is the transmission of electricity, the physical characteristics of electricity transmission, or cooling 120 kilowatts of heat. We can all agree on this, right?

So, these are not physical problems, and there is no need to invent anything. All of this requires supply chain planning. So, how important is supply chain planning? Very important. What I mean is, we take supply chain planning very seriously and have been doing so all along. We have a very good partnership with it. We value it greatly and are deeply involved in it, collaborating with partners like Vertiv to address cooling issues, as well as establishing deep partnerships with Siemens, Rockwell, Schneider, and others.

Through these collaborations, we have optimized our supply chain management, and our experience in building our own data centers has provided us with valuable practical knowledge. Starting from the first supercomputer DGX-1 in 2016, we have been building new supercomputers every year, and this year we are going to build several more. These experiences help us better understand and choose our partners.

As for NIMs, we provide two paths to help enterprise customers access AI. One is through our website and extensive network of solution providers, enabling NIMs to be converted into applicable applications. This market promotion includes large GSIs and smaller, more specialized GSIs, and we have many partners in this area.

Another more exciting direction is to provide businesses with tools along with collaborative tool solutions, which I believe will bring about significant changes. For example, the most common tool in the world, Microsoft Office, now has collaborative tools. Synopsys, Cadence, Ansys, all of these will have collaborative tools in the future.

We are also developing intelligent collaborative assistants for our own tools and partners, such as ChipNeMo developed for NVIDIA tools.

ChipNeMo is very intelligent, able to understand NVIDIA jargon, conversations about NVIDIA chips, and how to program NVIDIA programs. Therefore, for every engineer we hire, the first thing we introduce to them is ChipNeMo, before even the restroom or cafeteria...

These collaborative assistants understand specific languages and programs, greatly improving engineers' work efficiency.

We are building collaborative tools for all tools, something most companies may not be able to do. We can teach GSIs to do this, but in the field of tools like Cadence, they will build their own collaborative tools. They will treat them as engineers for rent I think they are sitting on a gold mine.

In the future, not only NVIDIA, but other companies such as SAP will also develop their own specialized collaborative assistants. In the case of SAP, ABAP is a language that only SAP enthusiasts will like. As you know, ABAP is a very important language for ERP systems worldwide. Every company is using ABAP. Therefore, now they must create a Chat ABAP, just like we created ChatUSD for Omniverse, and Siemens, Rockwell, and others will do the same.

Moreover, I believe this is another way to enter the enterprise, just like ServiceNow, they are building many collaboration tools. I think this will be an important means for them to explore potential value and open up a new world of AI labor industry. I am extremely excited about this.

Every time I see them, I will tell them, no matter where you are sitting, you are sitting on a gold mine, you are sitting on a gold mine. What I mean is I am very excited for them.

Tim Arcuri:

Jensen, hello. I am Tim Arcuri from UBS. I also have a question about TAM, which is more about the comparison between emerging markets and mature markets, as previously, H100 was mainly for new markets. We haven't seen anyone take off A100 and replace it with H100. But for B100, is it possible to see the first upgrade in mature markets, i.e., replacing A100 with B100?

If the total market expands from $1 trillion to $2 trillion, we will face a four-year replacement cycle. This means that about $500 billion of growth will come from upgrades to existing infrastructure. Can you comment on this?

Huang Renxun:

That's a good question. Currently, we are mainly upgrading the slowest computers in data centers, namely CPUs. This is a natural process. Next, we will gradually move on to updating Amperes, and then Hoppers.

I believe that in the next five to eight years, we will start to see a cycle of updating our own infrastructure. Nevertheless, I think the current updates are not the best use of capital. After all, you also know that Amperes are very efficient.

Brett Simpson:

I am Brett Simpson from Arete Research, thank you very much for hosting this wonderful event over the past few days. I would like to ask about inference. B100 performs well in inference performance compared to H100.

What information do you think the new platform will bring to customers in terms of ownership costs? How do you think B100 will perform compared to ASIC or other inference platforms on the market? Thank you 黄仁勋: I believe that large-scale language models have a new transformer engine and NVLink, making it very difficult, very difficult, and very difficult to surpass. This is due to the high dimensionality of the problem. The TensorRT-LLM optimization compiler and its underlying programmable Tensor Core architecture that I mentioned before, along with NVLink technology, allow multiple GPUs to work together at extremely low additional cost. As a result, the performance of 64 GPUs is as amazing as a single GPU, which is truly remarkable.

Therefore, without involving the additional cost of NVLink, it is not feasible to connect 64 GPUs through a network (such as Ethernet), which is essentially a waste of resources. The introduction of NVLink allows all GPUs to collaborate seamlessly, generating a token at once, which is a complex parallel computing challenge. Blackwell has greatly raised the industry standard, making it difficult for ASICs to compete.

C.J. Muse: Hello, Jensen and Colette, I am Cantor's C.J. Muse. Thank you for the invitation, it's nice to meet you both. I am curious about your pricing strategy. Historically, you mentioned a strategy of "the more you buy, the more you save."

However, it seems that Blackwell's pricing is somewhat favorable compared to the efficiency it provides. My question is, considering the possible adoption of a "razor and razor blade" sales model (selling software and a complete system), how do you adjust your pricing strategy? How should we view the normalization of profit margins in this scenario?

黄仁勋: Our pricing has always been based on Total Cost of Ownership (TCO). Thank you for your question, C.J. Our starting point has always been TCO. However, we also want to make it affordable for the majority of key users. If the customer base is specific to a particular field, such as molecular dynamics, and only targets one application, we will adjust the TCO accordingly. For example, for medical imaging systems, the TCO may be very high, but the market size is relatively small.

As the market expands, we hope to make Blackwell more affordable to a wider market. This is actually a self-balancing issue. As we address the TCO issue for a larger market, some customers may derive more value from it, but this is acceptable. We aim to streamline the business and provide a basic product to support a broad market. If the market becomes more segmented in the future, we can segment the market, but we are not at that stage yet. Therefore, we have the opportunity to provide very high value to the masses, offering excellent value to everyone, which is our goal.

Joseph Moore: Hello everyone, I am Joseph Moore from Morgan Stanley. I noticed that the specifications of the GB200 series products you introduced are very impressive, and you mentioned that this is due to a larger NVLink domain Can you provide a detailed comparison between the GB200 series and the GH200 series? And why do you think the GB200 will become a more outstanding product in the market?

Huang Renxun: Great question. In simple terms, the GH200 series (including 100, 200, Grace Hopper versions) was released before the more advanced Grace Blackwell series was widely popularized.

Furthermore, the Grace Hopper series carries additional burdens compared to the Hopper series. The Hopper series followed the Ampere series, progressing from A100 to H100, then to B100, and so on.

Therefore, this product line is already quite mature, and we will continue to develop in this direction. We have developed compatible software for these products, and users are already familiar with its operation.

The Grace Hopper series is different as it addresses new application scenarios that we previously did not cover well, such as multi-physics problems that require close collaboration between CPU and GPU, handling large datasets, and other challenges. The Grace Hopper series excels in these aspects. We have started developing software tailored for this series.

I currently recommend that most customers focus directly on the Grace Blackwell series. Therefore, regardless of how they are currently using the Grace Hopper series, it will be fully compatible with the Grace Blackwell series. This is a great advantage. Even if they have chosen the Grace Hopper series now, it is still a good choice, but I recommend investing more effort into the Grace Blackwell series because of its superior performance.

Unknown Analyst: Jensen, Collete, thank you for your insights today. My question is about robotics technology. It seems that every time we attend GTC, you always reveal some surprises at the end. Years later, we are surprised to find that you have been discussing this topic for a long time.

I understand that you mentioned that robotics technology may be approaching its ChatGPT moment. Could you explain what this means and how you see robotics technology gradually integrating into our daily lives?

Huang Renxun: First of all, thank you for your question. Two years ago, I presented the Earth-2 project. Two years later, we developed a new algorithm that can achieve a 3 km resolution regional weather forecast, requiring a supercomputer that is 25,000 times more powerful than the current one used for weather simulation. This resolution allows us to predict the weather more accurately.

Moreover, weather forecasting requires considering a large number of variables, as we need to simulate the distribution of different parameters to predict weather patterns. However, due to the massive computational resources required, conventional methods struggle to perform multiple simulations. We solved this problem by training AI to understand physical laws, enabling us to help people with regional weather forecasts worldwide With the help of AI, we have conducted approximately 10,000 weather simulations.

Two years ago, I demonstrated this AI model, and today we have connected to the most reliable weather data source in the world, namely Weather Company. Therefore, we will help people around the world with regional weather forecasts. This technology can be of great assistance to shipping companies, insurance companies, or regions that frequently face typhoon and hurricane threats. Well, in fact, we ushered in the ChatGPT moment a few years ago.

Taking a step back, ChatGPT is truly incredible. By learning from a large number of human examples and being able to understand and generate contextually relevant content, it can now generate raw tokens, understand and simulate actions through what is called tokenization learning specific action meanings.

The greatness of ChatGPT lies in continuously improving through reinforcement learning and human feedback. It will try to do one thing. You say this is better than that. It will try to do something else. You say: no, this is better than that. Human feedback, reinforcement learning, it will accept that reinforcement and improve itself.

So, what is the purpose of Omniverse? Omniverse is a platform that applies the concept of physical simulation feedback in the physical world to train AI through simulating interactions with various processes in the physical world. Have you caught up with my thinking? In short, we are leveraging the same set of principles and concepts to drive the widespread application of AI technology in different scenarios.

Of particular note is Isaac Sim, a robot training and simulation system built on the Omniverse platform, which has been very successful for anyone in the industry. We have built a new "operating system" for robots.

Atif Malik:

Hello, I am Atif Malik from Citi Group. I would like to ask, you mentioned that the Blackwell platform will officially ship later this year, can you specify which quarter of this year? First quarter or third quarter?

In addition, regarding the supply chain readiness for new products, especially the packaging of B200 CoWoS-L, how have you arranged it?

Colette Kress:

Let me address the second question first. Regarding the supply chain readiness, we have been preparing for the launch of these new products for over a year. We are very honored to have worked with our partners to jointly develop the supply chain, continuously improving its resilience and risk resistance. You are correct, we are exploring CoWoS, new storage technologies, and the large number of complex components we manufacture. This work is progressing steadily and will be well prepared when the products are launched in the market.

Furthermore, we are also collaborating with partners to ensure that the liquid cooling systems and data center constructions are ready. This is crucial for our plans and integration of all Blackwell configurations. As for the product launch timing, we hope to see the products on the market later this year We have had discussions with multiple clients, discussing designs and specifications. Their feedback on requirements has been very helpful for our supply chain readiness and production planning. While there may be some initial supply constraints, we are committed to meeting market demands.

Huang Renxun: Indeed. Hopper and Blackwell are designed to support current operational needs. There is a strong demand for Hoppers, and many clients are already familiar with Blackwell. We aim to inform clients of this information early to assist them in data center planning. Additionally, the demand for Hopper remains strong due to operational needs.

Pierre Ferragu:

I am Pierre Ferragu from New Street Research. I would like to inquire about the technical aspects of Blackwell, especially how the 10TB data transfer between the two chips is achieved. What are the technological and manufacturing challenges behind this?

Looking ahead, do you think we will see more chips integrated into a single package in the future? Also, considering the advancements in AI models, what is your view on the future direction of GPU architecture?

Huang Renxun:

I will start with the second question. In our role as the foundational platform for all AI research work, we are fortunate to be able to anticipate all upcoming research advancements. Of course, the goal of all next-generation models is to push the limits of current-generation systems. Therefore, for example, huge context windows, such as extremely large context windows, state space vectors, the generation of synthetic data, essentially model self-dialogue, reinforcement learning, essentially AlphaGo of large language models, tree search. These models will need to learn how to reason and perform multi-path planning.

So, rather than just a single attempt, this is a bit like carefully planning our actions when we think. That planning system, that multi-step reasoning system, may be very abstract, and the planned path may be very long, just like playing Go. However, such constraints are much more complex. Therefore, this entire research field is extremely exciting.

In the coming years, the types of systems we will witness will be unimaginable compared to today, for the reasons I described. Although some are concerned about the amount of internet data available to train these models, this is not actually a problem.

10 trillion tokens are already good enough, but don't forget, the generation of synthetic data, model dialogues, reinforcement learning, the amount of data you will generate will require two computers to train each other. Today we have one computer training on data, tomorrow it will be two computers, yes, remember that.

AlphaGo is multiple systems competing with each other, so we can do this as quickly as possible. Therefore, we are about to witness some truly exciting breakthrough work. We are confident that for these reasons, we hope that our GPUs will be larger in scale. Our company's SerDes is absolutely world-class The data transfer rate and energy consumption per bit are unparalleled. This is why we are able to achieve NVLink.

Remember, NVLink came into being because we couldn't make chips big enough, so we connected the chips together. This was in 2016. NVLink has now developed to the fifth generation. The rest of the world has not even reached the first generation of NVLink. With our independently developed fifth-generation NVLink technology, we have achieved seamless connections between up to 576 chips, significantly improving data communication efficiency and making it possible to build ultra-large-scale computing systems.

Personally, do data centers need to be connected so closely when they are so large? Not at all. Therefore, dividing them into 576 parts is also fine, and SerDes energy consumption is already very low. Now, we can make chips more closely together. We hope to do this because then the software cannot perceive the difference.

When dividing the chips, the algorithm should build the largest chip that lithography technology can achieve, and then connect multiple chips using any feasible technology. But initially, we must first create the largest chip in history. Otherwise, why didn't we combine multiple chips in the past? We have always been advancing single-chip technology. The reason is that the data transfer rate and energy consumption within the chip make the programming model as unified as possible, avoiding the so-called NUMA (Non-Uniform Memory Access) phenomenon.

Therefore, there will be no NUMA behavior, no strange cache behavior, no memory locality behavior, all of which could cause the program to work differently depending on the node it runs on. We want our software to perform consistently wherever it runs.

So, the first thing you need to do is manufacture the largest chip allowed by lithography technology. That is the first Blackwell chip. We connected two chips together. The 10TB per second technology is insane. No one has seen a 10TB per second connection before. And obviously, it consumes very little power, otherwise it would just be a connection. So, the first problem that needs to be solved is this.

The next problem to be solved is the CoWoS packaging technology mentioned earlier. We have adopted the world's largest capacity CoWoS packaging technology, which not only significantly improves product performance but also ensures the stability and reliability of the supply chain in the large-scale production process for the market.

The last surge in demand was quite sudden, but this time we have enough foresight. Therefore, Colette is absolutely right. We work closely with the supply chain and have a close partnership with TSMC. We are prepared for the exciting growth ahead.

Aaron Rakers: Thank you, I am Aaron Rakers from JP Morgan. Thank you very much for your detailed sharing. I would like to follow up on your previous mention of Ethernet and the discussion on Ultra Ethernet 黄仁勋: I am very optimistic about Ethernet technology.

Aaron Rakers: Yes. I am interested to know how NVLink interconnects 576 GPUs. How does this layout architecture play a role in the evolution of Ethernet, your Spectrum-4 product, and the development towards 800 Gbps? In other words, will NVLink compete with Ethernet in certain scenarios?

黄仁勋: No. First, the algorithm for building large-scale integrated circuits is actually very simple, that is, to build the largest possible chip. The chips we produce have reached the limit in size. Secondly, connect two chips as much as possible. When it becomes possible to connect two chips, we start to face challenges such as NUMA effects and locality effects. At this point, NVLink becomes crucial.

With NVLink, we can build the largest possible network of links based on cost and power consumption. We insist on using copper instead of fiber to connect up to 576 GPU chips (equivalent to a giant chip) to achieve energy efficiency and cost reduction for scalability. However, relying solely on 576 GPUs is far from enough, we need more interconnections.

At this level, InfiniBand is the best choice, followed by Ethernet integrated with an accelerated computing layer, namely Spectrum X. This way, we can effectively manage the internal data flow of the system, avoid data delays, and optimize overall computing speed. In fact, each technology has its own application scenarios, and our demand for optical technology remains very high, so there is no need to worry about the demand for optical technology.

Will Stein: Regarding the UAE sovereign AI project, can you provide specific details on how NVIDIA plans to operate? I want to know how we can explain accelerated computing to the older generation, such as my 91-year-old mother.

黄仁勋: For the second question, when explaining accelerated computing, you can use "using the right tools to complete the corresponding work" as a metaphor. Traditional general-purpose computing is like using the same screwdriver to do all the work, such as using a screwdriver for everything from getting up and brushing teeth to going to bed. With the development of time and the accumulation of human wisdom, we make general tools more versatile - adding brushes, bristles, etc. to the screwdriver.

CPUs perform well in sequential task processing but are not good at parallel processing. However, in most applications, such as Excel and most personal computer applications, CPU performance is already sufficient. But for new application areas like computer graphics and video games, 1% of the code determines 99% of the runtime. So we created hardware that is good at handling this 1% of the code, even if it performs poorly on the remaining 99% of the code.

This is why we have developed accelerated computing for fields such as molecular dynamics, medical imaging, seismic processing, artificial intelligence, etc. Accelerated computing can greatly increase processing speed, which is why fields like accelerated computing and data processing can achieve significant performance improvements Thank you. We are grateful for everyone's support and attention. We are at a special moment, witnessing a major turning point in the history of technology - the transformation of computing and the advent of a new era of software. The next decade will be crucial for all of us, and we look forward to facing the challenges together with you and creating a better future