The hottest chip research institution! SemiAnalysis founder: The computing power bottleneck has shifted from CoWoS to EUV, with storage consuming 30% of capital expenditure

The founder of SemiAnalysis stated that the bottleneck of AI computing power expansion is like "whack-a-mole" and is constantly changing. Currently, the bottleneck may return to chip manufacturing. The demand for high-bandwidth memory (HBM) triggered by inference models has a wafer consumption four times that of ordinary DRAM, which will lead to a halving of smartphone shipments. By 2026, 30% of the capital expenditure of tech giants will be consumed by storage. Power is not the ultimate constraint, as solutions such as aircraft modifications and fuel cells can be addressed through "post-meter" options. The ultimate ceiling may be after 2028, due to ASML's annual production of fewer than 100 EUV lithography machines

As the AI investment frenzy sweeps through the tech industry, where are the real constraints on computing power expansion? The answer given by Dylan Patel, founder of SemiAnalysis, is: the bottleneck is constantly changing.

Recently, in a podcast interview, Dylan Patel systematically explained the supply chain logic behind the expansion of AI computing power. He pointed out that the limiting factors for AI computing power have been changing over the past few years, much like a game of whack-a-mole; when one bottleneck is resolved, a new one emerges.

SemiAnalysis is a semiconductor research organization that has rapidly gained popularity in the tech and investment circles in recent years, with its research widely used by AI companies, cloud computing providers, and hedge funds.

The Bottlenecks of AI Computing Power Expansion are Constantly Changing

Patel stated that the bottlenecks in the AI industry chain have changed almost every year over the past few years.

He said, "A few years ago, the limitation on AI expansion was CoWoS packaging; last year it was power; and then it was data centers."

However, as these areas gradually expand production, new constraints begin to emerge.

Patel described this change: "The bottlenecks in computing power expansion are always moving. You solve one problem, and a new problem will emerge from another position in the supply chain."

Behind this change, the essence is that the growth rate of AI demand far exceeds the expansion rate of the industry chain.

Bottlenecks are Returning to Chip Manufacturing

Patel believes that as infrastructure such as data centers and power gradually expands, the core limitation of AI computing power is returning to the semiconductor manufacturing stage.

He said, "The biggest bottleneck is actually the computing power itself, and the longest-term supply chain for computing power is not power or data centers, but the semiconductor supply chain."

Specifically, the key constraints in the chip supply chain mainly include three parts:

Logic chip capacity (wafer fabrication capacity)
High Bandwidth Memory (HBM) and other memory chips
Wafer fabrication construction and equipment cycles

In contrast, the construction speed of data centers is significantly faster. This means that when AI demand suddenly surges, the chip supply chain often struggles to keep up. He stated:

In the wafer fabrication field, clean rooms are the biggest bottleneck this year and next. As we move into 2028, 2029, and 2030, there will still be constraints there.

Future Bottlenecks May Shift Downstream to Equipment Level

If AI computing power continues to grow rapidly, Patel believes that supply chain bottlenecks may continue to shift downstream. Ultimately, the limitation on computing power expansion may be the production capacity of semiconductor equipment.

He specifically mentioned extreme ultraviolet lithography (EUV) machines. These devices are manufactured by ASML and are core equipment for advanced chip production.

To further expand computing power, there will be different bottlenecks this year and next, but ultimately by 2028 or 2029, the bottleneck will fall to the bottom layer of the supply chain, which is ASML. Patel stated that the current global annual production of EUV lithography machines is about 70 units, which may increase to around 80 units in the coming years. Even with supply chain expansion, it will be difficult to exceed 100 units by the end of this decade.

In this context, equipment capacity may become the ultimate constraint on the expansion of AI computing power.

$1.2 billion lithography machines choke a $50 billion throat

To visually demonstrate the control of lithography machines over the overall situation, Patel calculated a striking figure.

Assuming a data center with 1 GW (gigawatt) of computing power is built using NVIDIA's next-generation Rubin chip, the entire semiconductor supply chain would need to consume: approximately 55,000 3nm wafers, 6,000 5nm wafers, and 170,000 DRAM storage wafers.

The manufacturing of these wafers requires about 2 million EUV exposures. Based on the throughput of a single EUV lithography machine, exactly 3.5 EUV lithography machines are needed.

This creates an extremely distorted leverage effect: Building a 1 GW data center requires a massive capital expenditure of about $50 billion; yet, supporting this $50 billion capacity is merely 3.5 EUV lithography machines valued at about $1.2 billion.

Since EUV lithography machines are the most complex machinery ever manufactured by humans, their core components (such as Carl Zeiss's lens assembly and Cymer's extreme ultraviolet light source) have a highly rigid supply chain. Even under the most aggressive expansion assumptions, ASML's current annual production capacity is about 70 units, increasing to 80 units next year, and barely breaking 100 units by 2030. This physically locks in the maximum annual increase in global AI computing power.

Storage squeeze: Consumers will pay for AI

In addition to logic chips, memory shortages will be the core trading theme in the next year or two. Patel provided a chilling prediction for the consumer electronics market: By 2026, about 30% of the capital expenditure of tech giants will flow into memory chips.

Long-context reasoning models require enormous KV Cache (key-value cache), which has completely ignited the demand for memory bandwidth and capacity. For example, HBM (high bandwidth memory) occupies four times the wafer area of ordinary DDR memory. This means that to produce 1 byte of AI memory, foundries must destroy 4 bytes of consumer electronics memory capacity.

"People will increasingly dislike AI. Because smartphones and PCs will not get better year by year; in fact, they will get worse," Patel bluntly stated.

As a large amount of DRAM capacity is seized by AI chips with more lucrative profits and long-term contracts, the BOM (bill of materials) costs for consumer electronics will soar. Patel estimates that the storage cost of an Apple iPhone may increase by about $150. Apple may be able to absorb or pass on this cost through brand premiums, but mid-to-low-end phones that focus on cost performance will suffer devastating blows Patel expects that as memory prices double or even soar, the global smartphone shipment volume, originally projected at 1.4 billion units per year, may drop to 800 million units this year and could even be halved to 500-600 million units next year.

Electricity is not an absolute constraint; space data centers are premature

In response to the ongoing market discussions about the "power crisis" and Elon Musk's proposal for "space data centers," Patel demonstrated a pragmatic attitude towards the capital market. He believes that electricity will not become the ultimate constraint; rather, it is a good business.

"Clearly, there are only three companies in the world that can manufacture combined cycle gas turbines, but there is still much we can do."

Patel pointed out that by adopting aircraft engine modifications (aero-mod micro-hybrid), medium-speed reciprocating engines (such as heavy truck or marine engines), Bloom Energy's fuel cells, and a combination of "solar + batteries," data centers can completely solve energy issues "behind the meter" (independent of the main grid).

Even if this leads to a doubling of the electricity price per kilowatt-hour, when allocated to the total cost of ownership (TCO) of an H100 at $1.40 per hour, it only adds a few cents. Compared to the enormous profits generated by AI models, this increase is negligible. Moreover, with sufficient utility-scale energy storage systems, the U.S. grid could release an additional 20% of its capacity for data center use.

As for Musk's idea of establishing data centers in space, Patel bluntly denied it. The extremely high failure rate of chips (about 15% of Blackwell units need to be returned or reinserted) and the expensive costs of space laser communication make this concept economically illogical. "At least within this decade, space data centers will not happen."

The full interview translation is as follows:

Timestamp

(00:00:00) – Why today's H100 is worth more than three years ago

(00:24:52) – Nvidia locked in TSMC's capacity early; Google is under pressure

(00:34:34) – By 2030, ASML will become the number one constraint on AI computing power expansion

(00:55:47) – Can't we directly use TSMC's old fabs?

(01:16:01) – The impending massive memory crisis

(01:42:34) – Expanding electricity supply in the U.S. will not be a problem

(01:54:44) – Space GPUs are unlikely to be realized within this decade

(02:14:07) – Why aren't more hedge funds participating in AGI investments?

(02:18:30) – Will TSMC push Apple off the N2 process?

00:00:00 – Why Today's H100 is Worth More Than Three Years Ago

Devakish Patel

Alright, this episode is my roommate teaching me about semiconductors.

Dylan Patel

This is also a farewell to this current microphone.

Devakish Patel

That's right. After you’re done with it, I thought, "I can't use this anymore. I need to get a new one."

Dylan Patel

Devakish doesn’t use second-hand stuff.

Devakish Patel

Dylan is the CEO of SemiAnalysis. Dylan, I have an urgent question for you. If we combine the four giants—Amazon, Meta, Google, Microsoft—according to your recently released data, their total projected capital expenditure this year is $600 billion. Based on the annual price of renting this computing power, that’s close to 50 gigawatts. Clearly, we can't increase power capacity by 50 gigawatts this year, so this money is likely paying for computing power that will come online in the next few years. How should we view the timeline for this capital expenditure coming online?

The same question applies to AI labs. OpenAI just announced they raised $110 billion, and Anthropic just announced they raised $30 billion. If you look at the computing power they are going to bring online this year—you should tell me the specifics—but isn’t the total also around another 4 gigawatts? The cost of computing power that OpenAI and Anthropic need to rent this year to maintain their computing power expenditure is $10 to $13 billion per gigawatt. Just these financing amounts are enough to cover their computing power expenditures for the entire year. This doesn’t even include the revenue they are going to earn this year.

So please help me understand: First, what is the actual timeline for the capital expenditures of these large tech companies to come online? Second, if the annualized cost of building a 1-gigawatt data center is $13 billion, then what is the purpose of these AI labs raising so much money?

Dylan Patel

So when we talk about these hyperscale cloud service providers' capital expenditures reaching the scale of $600 billion, and then look at other parts of the supply chain, the total will be close to $1 trillion. Part of this is for computing power coming online this year: that is, chips and some other capital expenditures paid for this year. But a large part of it is for upfront construction investments.

When we talk about the U.S. increasing capacity by 20 gigawatts this year, part of that is not this year's expenditure. In fact, some capital expenditures were already spent in the previous year. Looking at Google's $180 billion capital expenditure, a large part of it is for turbine prepayments for 2028 and 2029. Some is for data center construction in 2027. Some is for power purchase agreements, down payments, and other things they are doing for longer-term planning to enable this super-fast expansion This applies to all hyperscale cloud service providers and other participants in the supply chain.

Therefore, this year approximately 20 gigawatts will be deployed, a significant portion of which comes from hyperscale cloud service providers, while some does not. For all these companies, their biggest clients are Anthropic and OpenAI. Currently, Anthropic and OpenAI have about 2 to 2.5 gigawatts of computing power, and they are trying to scale up to a larger size.

Looking at what Anthropic has done in the past few months, increasing revenue by $4 billion or $6 billion, we can simply draw a straight line and predict that they will add another $6 billion in revenue each month. Some might argue that this is already a conservative estimate, believing they should grow faster. This means that over the next ten months, they will add $60 billion in revenue. According to the last reported gross margin for Anthropic, this means they need to invest about $40 billion in inference computing for this $60 billion in revenue.

This $40 billion in computing costs, calculated at about $10 billion per gigawatt in leasing costs, means they need to add 4 gigawatts of inference capacity just to support the revenue growth. This assumes their R&D training clusters remain unchanged. In a sense, Anthropic needs to reach well above 5 gigawatts by the end of this year. This will be very difficult for them to achieve, but it is possible.

Devakish Patel

Can I ask a question about this? If Anthropic's goal by the end of this year is not to reach 5 gigawatts, but they need that much computing power to service revenue that grows crazily beyond expectations—and possibly even more—while also conducting research and training to ensure next year's models are good enough, where will this computing power come from?

Dylan Patel

Dario (CEO of Anthropic) was very conservative the last time he was on your podcast. He said, "I won't go crazy on computing power because if my revenue grows at different rates at different times... I don't want to go bankrupt. I want to make sure we are responsible as we scale." But in fact, compared to OpenAI, he messed up. OpenAI's approach is, "We just signed these crazy agreements."

By the end of this year, the computing power obtained by OpenAI will far exceed that of Anthropic. What must Anthropic do to obtain computing power? They have to look for some lower-quality suppliers, ones they wouldn't have considered before. Historically, Anthropic has had the best suppliers, like Google and Amazon, which are the largest companies in the world. Now Microsoft is expanding its supply chain, and Anthropic has to look for other newer participants.

OpenAI has been more proactive in collaborating with multiple participants. Yes, they have obtained a lot of computing power from Microsoft, Google, and Amazon, but they have also obtained a significant amount of computing power from CoreWeave and Oracle They went to find some unknown companies, or rather companies that people would consider random, such as SoftBank Energy, which has never built a data center but is now constructing one for OpenAI. They also looked for many other companies, like NScale, to obtain computing power.

This is a dilemma for Anthropic because they are too conservative with their computing power and do not want to invest recklessly. In a sense, much of the financial panic in the second half of last year was because, "OpenAI signed all these agreements, but they don't have the money to pay for them..." Well, Oracle's stock would plummet, and CoreWeave's stock would plummet. All these companies' stocks fell sharply, and the credit market went crazy because people thought the ultimate buyers couldn't afford to pay. Now, "Oh wait, they raised a huge amount of money. Well, then they can afford it."

Anthropic, on the other hand, is much more conservative. They think, "We will sign contracts, but with principles. We will intentionally set our targets lower than what we might achieve, staying conservative because we don't want to risk bankruptcy."

Devakish Patel

What I want to understand is, "What does it really mean to acquire computing power in an emergency?" Is it necessary to look for those "neoclouds"? Is their computing power worse? In what way? Is it because they come in at the last minute and have to pay higher margins to cloud service providers? Who built this idle capacity that allows Anthropic and OpenAI to call upon it temporarily? If OpenAI's computing power scale is similar to Anthropic's by 2027, what is the actual advantage they gain now? Will there be a difference in their computing power (in gigawatts) by the end of this year? If so, how many gigawatts will Anthropic and OpenAI have by the end of this year?

Dylan Patel

Yes, to acquire excess computing power, there is capacity available at the hyperscale cloud service providers. Not all computing power contracts are long-term five-year agreements. Some computing power comes from contracts signed in 2023 or 2024, or from H100 contracts for 2025, which are shorter in duration. The vast majority of OpenAI's computing power is signed under five-year contracts, but many other customers have signed short-term contracts or on-demand contracts for 1 year, 2 years, 3 years, or 6 months.

As these contracts expire, who are the market participants most willing to pay high prices? In this sense, we see that the price of H100 has already risen significantly. People are willing to sign long-term contracts above $2 (per hour). I have seen some deals where certain AI labs—I'm being vague for reasons—signed H100 contracts lasting 2 to 3 years at prices as high as $2.40. If you consider the profit margins, the cost of building Hopper (amortized over five years) is $1.40. Now, two years have passed, and you are still signing 2-3 year contracts at a price of $2.40? Those profit margins are much higher.

Now you can squeeze out all other suppliers, whether owned by Amazon or companies like CoreWeave, Together AI, Nebius, and so on. These new cloud companies typically have a higher proportion of Hopper because they are more aggressive on Hopper. They also tend to sign shorter contracts, not referring to CoreWeave, but others. So, if someone wants Hopper, there is still some capacity available in the market.

Additionally, while at companies like Oracle or CoreWeave, most of Blackwell's capacity is contracted through long-term agreements, any capacity coming online this quarter has already been sold. In some cases, they haven't even reached all the numbers they promised to sell due to delays in some data centers, not just these two, but also Nebius, Microsoft, Amazon, and Google have delays. However, there are many new cloud companies, as well as some hyperscale cloud service providers, that either have capacity under construction that hasn't been sold yet or capacity originally intended for internal uses that are not super AGI focused, which may now be sold instead.

Or, in the case of Anthropic, they don't necessarily have to own all the computing power directly. Amazon can own the computing power and provide services through Bedrock, Google can provide services through Vertex, Microsoft can provide services through Foundry, and then share revenue with Anthropic, and vice versa.

Devaksh Patel

Basically, you are saying that Anthropic now has to pay either a 50% markup (in the form of revenue sharing) or last-minute spot pricing for computing power, which they could have avoided if they had purchased the computing power in advance.

Dylan Patel

Right, there is a trade-off here. But at the same time, for a full four months, everyone has been telling OpenAI, "We won't sign a deal with you." It sounds crazy, but the reason is "you don't have the money." Now everyone is saying, "OpenAI, we always believed in you. Now that you've raised so much money, we can sign any deal." In this sense, Anthropic is constrained. There aren't many new buyers willing to purchase computing power because Anthropic has first reached that level of explosive revenue growth.

Devaksh Patel

This is interesting. Otherwise, you might think that having the best model is an extremely depreciating asset because you might not be the best three months later. But importantly, you can lock in computing power in advance and get a better price. Maybe this is an obvious point. But at least until recently, people have been emphasizing the depreciation cycle of GPUs. Bears, like Michael Burry, would say, "Look, people say these GPUs can last four or five years. Maybe because technology is advancing too quickly, it actually makes sense to adopt a two-year depreciation cycle for these GPUs," which would increase the amortized capital expenditures reported that year, making the construction of all these cloud services economically less viable

But you actually pointed out that the depreciation cycle may be longer than five years. If we are still using Hoppers—especially if AI really takes off, and by 2030 we are still saying, "We need to get the 7nm wafer fab up and running, we need to reactivate the A100,"—then the depreciation cycle could actually be very long. I think that's an interesting financial implication you mentioned.

Dylan Patel

There are several clues to explore here. One is how GPU depreciation will work. I think I didn't answer your question before, which is that I believe Anthropic will reach around 5 gigawatts by the end of this year, maybe a bit more, which includes their own computing power as well as services provided through Bedrock, Vertex, or Foundry. I think they can reach 5 or 6 gigawatts, which is much higher than their initial plan. According to our data, OpenAI is roughly the same, actually slightly higher.

Anyway, the depreciation cycle of GPUs. Michael Burry says it's three years or less. That's his general view. There are two angles to look at this issue. Mechanically, there is a TCO (Total Cost of Ownership) model for GPUs, based on which we predict GPU pricing and calculate the total cost of clusters. There are many costs involved: your data center costs, network costs, on-site operational personnel costs for replacing equipment, spare parts costs, actual chip costs, server costs. All these costs add up. They each have their own depreciation cycles and credit costs.

Ultimately, it calculates that "if the depreciation period is five years, the cost of large-scale deployment of H100 is $1.40 per hour." If you sign a five-year contract at $2 per hour, your gross margin is about 35%, slightly higher. If you sign at $1.90, the gross margin is also about 35%. Then assume by the fifth year, the GPU is scrapped.

In some cases, the argument made is that if you don't sign a long-term contract, because NVIDIA's computing power triples or quadruples every two years, while the price only increases by 50% or doubles... then the price of H100... of course, its value in the market in 2024 might be $2 (corresponding to a 35% gross margin), but by 2026, when Blackwell is deployed at ultra-high volumes, with millions of units per year, it might actually only be worth $1/hour. And when Rubin (although it starts shipping this year, will only reach ultra-high volume next year) is also deployed to the cloud at millions of units per year, with performance tripling again, and the price increasing by 50% or doubling, then Hopper might only be worth $0.70 per hour. So GPU prices will continue to decline. That's one perspective.

Another perspective is, how much utility can you get from the chip? If you can build unlimited Rubins or the latest chips, then yes, that will indeed be the case. With the launch of new chips, the price per unit of performance decreases, and the spot or short-term contract price of Hopper will drop However, due to the significant limitations on semiconductor capacity and deployment timelines, the pricing of these chips is not determined by the cost-performance ratio of alternatives I can buy today, but rather by the actual value I can derive from this chip today.

In this sense, let's take GPT-5.4 as an example. The operating cost of GPT-5.4 is much lower than that of GPT-4, and it has fewer activated parameters. Since it is a sparser MoE model, while GPT-4 is denser, it is much smaller. Additionally, there are many other advancements in training, reinforcement learning, model architecture, and data quality that make GPT-5.4 significantly better than GPT-4. Moreover, its inference cost is lower. When running on H100, it can serve a much larger number of tokens than when running GPT-4. So it is producing more tokens at a higher quality.

What is the maximum total addressable market (TAM) for GPT-4 tokens? Perhaps tens of billions, maybe hundreds of billions. Market acceptance takes time. For GPT-5.4, this number could exceed one hundred billion dollars. However, there are factors such as market acceptance lag, intense competition, and continuous improvements by others. If improvements were to stop, the value of H100 now depends on the value it can generate from GPT-5.4, rather than from GPT-4. These labs are in a competitive environment, so their profit margins cannot be infinitely high. This creates a very interesting dynamic: today's H100 is worth more than it was three years ago.

Devaksh Patel

This is crazy. It's also very interesting in the long run. If we really develop AGI models, if we have truly human-level intelligent agents... the estimated numbers regarding how many flops the human brain can perform are very rough. But in terms of flops, H100 is estimated to perform 1e15 operations, which is similar to some estimates of the human brain's flops. Clearly, in terms of memory, the human brain is much larger. H100 has 80 GB, while the human brain may have PB levels.

Dylan Patel

Oh, really? You have PB levels? Bro, show me a PB-level string of 0s and 1s.

Devaksh Patel

That's exactly the point I wanted to make.

Dylan Patel

No, we just have the best sparse attention technology ever.

Devaksh Patel

Seriously. In terms of compressed information, it could be at the PB level. The human brain is an extremely sparse MoE. But anyway, imagine a human knowledge worker creating six figures of value in a year. If a single H100 can generate comparable value, if we truly have human intelligence on a server, then the value of H100 is high enough to recoup costs in a few months

So when I interviewed Dalio, the point I wanted to express was not that I believe the singularity will arrive in two years, and therefore Dalio urgently needs to buy more computing power, although his revenue growth does require him to do so. What I wanted to express is that, given what Dalio seems to have said—he mentioned that we could have a data center composed of geniuses within two years, at most five years, and that a data center composed of geniuses should be able to generate trillions in revenue—then his continuous statements about being more conservative with computing power, or as you said, being less aggressive than OpenAI, make no sense at all.

I think this point was misunderstood, and then people started attacking me, saying, "Oh, this podcast host wants to convince the CEO of this hundred billion dollar company to take a gamble." I just wanted to say that his internal statements are inconsistent. Anyway, it's good to clarify this now.

Dylan Patel

I think going back to the previous point, if the model is that powerful, then the value of GPUs will increase over time. Currently, only OpenAI and Anthropic have this view. But as we move forward, everyone will see the value of each GPU skyrocketing. So in that sense, you should lock in computing power now.

Interestingly, in the style of Anthropic, there's a joke that they have commitment phobia, a bit of a polyamorous flavor. Not referring to Dalio himself, but it's a joke.

Devaksh Patel

Explained everything. By the way, there's an interesting concept in economics called the Alchian-Allen effect, which essentially states that if you increase the fixed costs of different goods, one of which is of higher quality and the other of lower quality, then marginally people will be more inclined to choose the higher quality good.

For a specific example, suppose good-tasting apples sell for $2, and bad-tasting apples sell for $1. Now suppose an import tariff is imposed on them. Then the prices of good apples and mediocre apples become $3 versus $2.

Dylan Patel

Is it because both increased by $1, or should it be a 50% increase?

Devaksh Patel

No, because both increased by $1. The whole effect is that if a fixed cost increment is applied to both, then the price difference between them, that is, the price ratio, will change. Previously, the expensive one was twice as cheap as the cheaper one. Now it is only 1.5 times.

So I wonder if applying this to AI means that if GPUs become more expensive, then the price of computing power will also have a fixed cost increase. As a result, this will encourage people to be willing to pay a higher profit margin for slightly better models. Because computing power will cost this much anyway, I might as well pay a little more to ensure I'm using the best model rather than a slightly inferior one.

Dylan Patel

So Hopper has risen from $2 to $3. If one Hopper can generate one million tokens of Opus, or two million tokens of Sonnet, then the price difference between Opus and Sonnet has narrowed because the price of GPUs has increased from $2 to $3.

Interesting. I think this makes a lot of sense. We see that all usage today is concentrated on the best models, and all revenue comes from the best models. In a world with limited computational resources, two things will happen. First, companies that do not have commitment phobia and have signed five-year compute contracts have locked in huge margin advantages. They locked in compute for the next five years at prices from 2, 3, or 5 years ago.

And if your five-year contract has been in effect for three years, while others' two- or three-year contracts are expiring, and now they want to buy at current prices, which are determined by model value, then the price will be much higher. So those who committed early generally have higher margins. The proportion of long-term contracts in the market is far greater than the proportion of short-term contracts that can be flexibly incremented at the last moment.

At the same time, where will the margins flow? As models become more valuable, to what extent can cloud service providers flexibly price? Look at CoreWeave, their average contract length is over three years. More than 98% of their compute contracts are over three years. So they ultimately face a dilemma of not being able to price flexibly. But their new capacity added each year far exceeds the previous stock.

Just this year alone, the new capacity added by Meta is equivalent to all the compute and data centers they had in 2022 for serving WhatsApp, Instagram, and Facebook, as well as running AI. They are adding so much this year.

Similarly, Meta is doing this, and CoreWeave, Google, Amazon, all these companies are adding massive amounts of compute every year. This new capacity is traded at new prices. In a sense, yes, as long as we are in a takeoff phase, you have locked in the previous cheap prices. "Oh, OpenAI increased from 600 megawatts to 2 gigawatts last year, and this year from 2 gigawatts to over 6 gigawatts, and next year from 6 gigawatts to 12 gigawatts." The new capacity is where all the costs are, not the previous long-term contracts.

So who holds the pricing power? It is the infrastructure providers. Now cloud service providers, new cloud companies, or hyperscale cloud service providers can charge higher profits. They can do this to some extent, but when you look upstream, who holds most of the memory and logic capacity? Mainly NVIDIA. They have signed many long-term contracts. Today they have $90 billion in long-term contracts in hand, and they are negotiating new three-year contracts with memory suppliers

Amazon and Google are collaborating with Broadcom, with Amazon doing it directly, and there's also AMD. These companies hold all the chips because they have locked in production capacity. TSMC has not significantly raised prices, but memory suppliers are raising prices substantially, possibly doubling or tripling again, but at the same time, they are signing these long-term contracts.

Who can capture all the profits? It could be cloud service providers, chip suppliers, or memory suppliers, until TSMC or ASML breaks the situation and says, "No, we want to charge more." But in the meantime, can model providers achieve crazy high profits? At least this year, we will see model providers' profit margins rise significantly. Because they are so constrained by computing power that they have to suppress demand. Anthropic cannot continue at the current pace without suppressing demand.

00:24:52 – NVIDIA locked in TSMC's production capacity early; Google is under pressure.

Devraksh Patel 1:20:33

Let's talk about logic and memory. How exactly did NVIDIA lock in so much production capacity? I remember according to your data, by 2027, NVIDIA will occupy more than 70% of N3 wafer capacity, roughly that number. I forgot the specific numbers for memory from companies like SK Hynix and Samsung.

Think about how the new cloud business operates, how NVIDIA collaborates with it, or how the reinforcement learning environment business operates, and how Anthropic collaborates with it. In both cases, NVIDIA is purposefully trying to split complementary industries to ensure they have as much leverage as possible. They allocate production capacity to various random new cloud companies to ensure no single entity controls all the computing power.

Similarly, when Anthropic or OpenAI collaborates with data providers, they will also say, "No, we want to cultivate a large industry so that we are not bound by any single data environment supplier."

I wonder, on the 3-nanometer process—this will be for Trainium 3, TPU v7, and other possible accelerators—why did TSMC give all of this to NVIDIA instead of trying to split the market?

Dylan Patel

There are a few points here. On 3 nanometers, if we go back to last year, the vast majority of 3-nanometer production capacity was occupied by Apple. Apple is transitioning to 2 nanometers. Memory prices are rising, so Apple's shipments may decline. As memory prices rise, they will either cut profits or continue to push forward. Since they have long-term contracts, there will be some time delays, but Apple may reduce demand or transition to 2 nanometers faster, which currently can only be used for mobile chips. In the future, AI chips will also transition there. So Apple has this situation.

Apple is also negotiating with third-party suppliers because they are somewhat being squeezed out of TSMC's production capacity. TSMC's profit margins on high-performance computing (HPC, i.e., AI chips) are higher than on mobile chips because their advantage in the HPC field is greater than in the mobile field

Look at TSMC's calculation logic; they actually provide quite good capacity allocation for companies that make CPUs. Think about Amazon with Trainium and Graviton, both on 3 nanometers, where Graviton is their CPU and Trainium is their AI chip. TSMC is more willing to allocate capacity to Graviton rather than Trainium because they believe the CPU business is more stable and has long-term growth potential.

As a conservative company that does not want to overly chase growth cycles, you would actually prioritize allocating capacity to slower-growing but more stable markets before allocating all new capacity to fast-growing markets. This is usually the case. The same goes for AMD; TSMC is much more enthusiastic about their CPU capacity allocation than for GPUs. The same applies to Amazon.

NVIDIA is a bit unique because, yes, they have CPUs, and they manufacture switches, networking equipment, NVLink, InfiniBand, Ethernet, and NICs. Overall, with the release of Rubin and its series of chips this year (GPUs are the most important), most of these will transition to 3 nanometers by the end of the year. However, NVIDIA will still receive the majority of the supply.

Part of the reason is that when you look at the market, TSMC and other companies forecast market demand in many ways, but this is also a market signal. The market signal is, "Hey, we need this much capacity next year. We need this much. We will sign non-cancelable, non-returnable contracts. We might even pay a deposit." NVIDIA just acted much earlier than Google or Amazon. In some cases, Google and Amazon encountered stumbling blocks. Some of their chips were delayed by several quarters. Chips like Trainium experienced delays.

In this case, the situation turned into, "Well, these guys are delayed, but NVIDIA wants more, more, more. We also need to check other parts of the supply chain to see if there is enough capacity?" They have to ask all the PCB suppliers, "Do you have enough PCB capacity?" All the PCBs come from one company in Asia, or many come from one company in Asia. They would ask, "Do you have enough PCB capacity? Good. Hey, memory suppliers, who has all the memory capacity? Good, NVIDIA has it. Great."

When you look at who is truly "AGI-pilled" enough to be willing to purchase computing power on a long cycle, their level seems absurd to those who are not "pilled" — yet they are willing to pay quite high profit margins and sign contracts now because they believe future rates will mess up — the same situation occurs in the semiconductor supply chain. I don't think NVIDIA is completely "AGI-pilled." Jensen Huang does not believe that software will be fully automated, etc.

Devaksh Patel

It's accelerated computing, not AI chips, right?

Dylan Patel

It's AI chips.

Devakish Patel

But that's how he refers to it, right?

Dylan Patel

Yes. I think it's a broader term, AI is included, but it also encompasses physical modeling and simulation.

Devakish Patel

But it seems like he hasn't embraced the main use cases.

Dylan Patel

I think he is embracing it, but I just don't think he's as swayed by AGI as Dalio or Sam. But he is still much deeper into it than Google or Amazon were last Q3; he sees more demand.

The reason is simple. You can see all the data center construction. He would think, "Well, I want this market share." We tracked all the data centers, and there are many data centers that could be this or that. To some extent, Google and Amazon, especially Google, even though their TPUs are better for them to deploy, they still had to cram a lot of GPUs into their data centers because they didn't have enough TPUs to fill their data centers. They couldn't produce that many.

Devakish Patel

I have a question about that. Did Google sell a million units, was it v7?

Dylan Patel

Yes.

Devakish Patel

— that was sold to Anthropic's Ironwoods, and now you're saying that this year or next year, I think from now on forever, the biggest bottleneck will be the logic and memory needed to manufacture these chips. Google has DeepMind, the third-largest AI lab. If this is a big bottleneck, why don't they just give it directly to DeepMind instead of selling it?

Dylan Patel

That's again... the people at DeepMind would think, "This is crazy. Why would we do this?" But the Google Cloud people and Google executives have different thoughts.

You and I know the compute team at Anthropic. The two main people came from Google. They saw this mismatch, negotiated a deal, and were able to secure this compute before Google realized it. At least from the data we've discovered, the process of events was that at the beginning of Q3, over about six weeks, we saw a significant increase in TPU capacity. There were several increases during those six weeks.

They made multiple requests. Google even had to go to TSMC to explain why they needed to increase capacity so much because it was so sudden. A large part of that increased capacity was to sell to Anthropic. Because Anthropic saw the opportunity earlier than Google.

Then Google had "Nano Banana" and Gemini 3, which skyrocketed their user metrics. Then Google's leadership realized, "Oh." Then they started making statements saying we have to double compute every six months, or whatever the specific number is

They really woke up a lot, and then they went to TSMC and said, "We want more. We want more." TSMC replied, "Sorry, guys, we are sold out. We might be able to increase by 5-10% in 2026, but really, we have to plan for 2027."

In my view, there is this information asymmetry between the labs. I'm not sure about the specific details. This is a narrative I constructed based on all the supply chain data regarding wafer orders that I've seen, as well as the situation with Anthropic and Fluidstack signing data center agreements.

It’s clear to me that Google messed up. You can see this from Google's Gemini ARR. They had almost nothing from Q1 to Q3—there was a slight growth starting in Q3. But in Q4, their ARR reached $5 billion. It’s obvious that Google did not initially see a surge in revenue. In a sense, Anthropic had a bit of commitment phobia before their ARR exploded, even though they had more information asymmetry and saw what was going to happen in the future. Google was more conservative than Anthropic, and Google's ARR was even lower. So they were unwilling to act, and then they realized they should have.

Since then, Google has become absurdly "AGI hype" in action. They acquired an energy company. They paid deposits for turbines. They purchased an astonishing amount of land for supporting power grids. They went to utility companies to negotiate long-term agreements. They have been very proactive in data centers and power. I think Google woke up at the end of last year, but it took some time.

Devaksh Patel

How many gigawatts do you think Google will have by the end of next year?

Dylan Patel

Buy my data.

Devaksh Patel

You charge for this kind of information.

Dylan Patel

Yes, yes.

00:34:34 – By 2030, ASML will be the number one constraint on AI computing power expansion

I feel that the bottleneck hindering our expansion of AI computing power changes every year. A few years ago, it was CoWoS packaging. Last year, it was power. You tell me what this year's bottleneck is.

But I want to understand what will constrain our deployment of the "singularity" five years from now?

Dylan Patel

The biggest bottleneck is the computing power itself. For this, the longest delivery cycle in the supply chain is not power or data centers. It is actually the semiconductor supply chain itself. The bottleneck has shifted back from power and data centers to chips.

In the chip supply chain, there are many different bottlenecks. There is memory, TSMC's logic wafers, and the fabs themselves. Building a fab takes two to three years, while a data center takes less than a year. We have already seen that Amazon can build a data center in just eight months. Because the fabs that manufacture chips are extremely complex, their delivery cycles vary greatly. The equipment needed to build a fab also has long lead times

As we continue to expand, the bottleneck will shift based on the parts of the supply chain that are currently unable to meet demand. Previously, it was CoWoS, power, and data centers, but these are all projects with shorter delivery cycles. CoWoS is a simpler process of packaging chips together. Power and data centers are ultimately much simpler than actually manufacturing chips. The capacity of mobile or PC chips can be partially transferred to data center chips, and this transfer is feasible to some extent.

However, CoWoS, power, and data centers have to be built from scratch as a new supply chain. But now, the mobile and PC industries—which were once the major players in the semiconductor industry—no longer have excess capacity to transfer to AI. NVIDIA is now the largest customer of TSMC and the largest memory manufacturer SK Hynix. The possibility of reallocating resources from ordinary PCs and smartphones to AI chips has essentially disappeared. So the question now is, how do we scale up the production of AI chips? This is the biggest bottleneck we face by 2030.

Devakish Patel

It would be very interesting if we could predict the absolute gigawatt limit by 2030 based solely on "we cannot produce more than this number of EUV lithography machines."

Dylan Patel

To further expand computing power, there are different bottlenecks this year and next, but ultimately by 2028 or 2029, the bottleneck will fall to the bottom of the supply chain, which is ASML. ASML manufactures the most complex machines in the world: EUV lithography tools. They sell for $300-400 million each. Currently, they can produce about 70 units per year. Next year, they will reach 80 units. Even with very aggressive supply chain expansion plans, by the end of this decade, they will barely exceed 100 units.

What does this mean? By the end of this decade, they will be able to produce about 100 of these tools, while currently, there are 70. How does this translate into AI computing power? We see the numbers mentioned by Sam Altman and many others in the supply chain: gigawatts, gigawatts, gigawatts. How many gigawatts are we adding each year? We see Elon saying he wants to build a hundred gigawatts in space.

Devakish Patel

Every year.

Dylan Patel

Every year. Any issues or challenges facing these numbers are not actually related to power or data centers. We can delve into this, but the key lies in manufacturing chips.

Take a gigawatt NVIDIA Rubin chip, for example. Rubin will be unveiled at the GTC conference, which I believe is in the same week this podcast episode is released. To manufacture a gigawatt (data center capacity) of NVIDIA's latest chip (to be released by the end of this year), you need several different wafer technologies. You need about 55,000 3-nanometer wafers. You need about 6,000 5-nanometer wafers. Then you need about 170,000 DRAM memory wafers

These three different categories each require a different number of EUV processes. To manufacture a wafer, there are thousands of steps; you need to deposit materials and then remove them. But the key step—at least for advanced logic chips, which accounts for 30% of the chip cost—actually does not involve adding anything to the wafer. You take the wafer, coat it with photoresist, which is a chemical that undergoes a chemical change when exposed to light. Then you put it into the EUV tool and expose it in a specific way. This is called lithography. There is a so-called mask, which acts like a template for the design.

Look at a leading 3-nanometer wafer; it has about 70 layers of lithography, around 70 layers, but 20 of those layers are the most advanced EUV. If you need 55,000 wafers to produce one gigawatt, and each wafer requires 20 EUV processes, you can do the math. One gigawatt requires 1.1 million EUV processes. Adding in other parts (5 nanometers and all memory), the final total is 2 million. So, one gigawatt requires about 2 million EUV processes.

These tools are very complex. Think about what it does on the wafer; it moves, scans, and steps. This process has to be repeated dozens of times across the entire wafer. When we talk about the number of EUV processes, we refer to the entire wafer being exposed at a certain rate.

An EUV tool can process about 75 wafers per hour, with the equipment running at about 90% uptime. Ultimately, you need about 3.5 EUV tools to complete the 2 million EUV wafer processes required for one gigawatt. So, 3.5 EUV tools can meet the demand for one gigawatt.

Think about this number; it’s interesting. What is the cost of one gigawatt? About $50 billion. And what is the cost of 3.5 EUV tools? That’s $1.2 billion. In fact, this number is much smaller, which is interesting. $50 billion in economic capital expenditure is allocated to data centers, while the tokens generated on top of that are worth even more. The potential $100 billion AI value could be injected into the supply chain, yet it is supported by these mere $1.2 billion tools, which themselves cannot quickly scale their supply chain.

Devakish Patel

You recently wrote an article stating that TSMC's capital expenditure over the past three years was $100 billion, or $30 billion, $30 billion, and $40 billion per year. A small portion of this was used by NVIDIA for its chips in the 3-nanometer or earlier 4-nanometer processes. What was NVIDIA's profit last quarter? It was $40 billion. So, $40 billion multiplied by 4 is $160 billion. NVIDIA, as a single company, has converted a small portion of TSMC's $100 billion capital expenditure (which will be depreciated over many years, not just this year) into a single-year profit of $160 billion

When you go down the supply chain to ASML, the situation becomes even more extreme, as they use machines worth $1 billion to produce one gigawatt of computing power. Of course, these machines can be used for more than a year, so their contribution to output is more than just that.

Now I want to understand how many such machines there will be by 2030, including those accumulated in previous years, not just those sold in that year. What does this mean? Sam Altman said he wants to achieve one gigawatt every week by 2030. When you add these numbers together, is it compatible with his goal?

Dylan Patel

If you think about it carefully, it is completely compatible. TSMC and the entire ecosystem already have about 250 to 300 EUV tools. Then this year, they will add 70 tools, 80 next year, and grow to 100 by 2030. By the end of this decade, you will have 700 EUV tools. 700 EUV tools, assuming 3.5 tools are needed per gigawatt—assuming all are used for AI, which is not the case—can produce 200 gigawatts of AI chips for data center deployment.

Sam wants 52 gigawatts per year. So he only needs to capture 25% of the market share. Obviously, a portion has to be allocated to mobile and PC, assuming we can still have consumer products and are not pushed out of the market. But roughly speaking, he means capturing 25% of the total global chip output. This is very reasonable because just this year alone, I think he can achieve 25% of the deployed Blackwell GPUs. That’s not crazy.

Devakish Patel

When did ASML start shipping EUV tools? Was it around the time of 7 nanometers? I don’t know the exact time. Are you saying that by 2030, they will still be using machines that were initially shipped in 2020? So for this decade, you will be using the same machine that is the most important in the world’s most advanced technology industry? I find that surprising.

Dylan Patel

ASML has been shipping EUV tools for almost ten years, but the real large-scale production started around 2020. These tools are not static. At that time, the throughput of the tools was lower. They have various specification requirements, such as overlay accuracy. As I mentioned before, you stack layers one by one. You will do some EUV processing and then go through many different process steps—depositing materials, etching, cleaning wafers—dozens of such steps before the next EUV layer.

There is a specification called overlay accuracy, which means: you have done all this work, drawn lines on the wafer, and now I want to draw some points. Suppose I want to draw some points to connect these metal lines and vias, and the previous layer has another set of vertical lines, so now you need to connect the wires that are perpendicular to each other. You must align them precisely. This is called overlay

The overlay accuracy is a metric that ASML has been rapidly improving. The wafer throughput has also been rapidly improved by ASML. The price of the tools has increased, but not as much as the improvement in tool performance. Initially, EUV tools were priced at $150 million. Over time, looking ahead to 2028, they are now $400 million. However, the performance of the tools has more than doubled, especially in terms of throughput and overlay accuracy. Overlay accuracy refers to the ability to precisely align subsequent layers to previous layers, even after a large number of steps between each processing.

ASML is improving at an ultra-fast pace. It is also worth noting that ASML may be one of the most generous companies in the world. They hold the key link. No other company can compete with them. You can ask some of the other people we often talk about, like Leopold, and they will say, "Let the prices rise." Because they can. The margins are there. You can capture the margins. Nvidia is capturing the margins. Memory manufacturers are capturing the margins. But ASML has never raised prices beyond the extent of tool performance improvements.

In a sense, they always provide net benefits to their customers. It’s not that the tools have stagnated; it’s just that these tools are getting old. Yes, you can upgrade them, and new tools are constantly being introduced. For simplicity, in this podcast, we have ignored the advancements in overlay accuracy or throughput for each tool.

Devakish Patel

You said we are producing 60 of these machines this year, and in the following years, it will be 70, 80. What would happen if ASML decided to double or triple its capital expenditures? What is stopping them from producing more than 100 units by 2030? Why are you so confident that even five years from now, you can be relatively sure of their output?

Dylan Patel

I think there are several factors here. ASML has not decided to "go all out" and expand capacity as quickly as possible. Overall, the semiconductor supply chain has not done so either. They have experienced booms and busts, and we can talk more about that. Basically, some manufacturers have only recently woken up to this, but overall, no one really saw the demand for 200 gigawatts of AI chips per year, or the trillions of dollars in semiconductor supply chain spending each year. They were not fooled by AI. They were not fooled by AGI.

Devakish Patel

We are going to reach a trillion dollars this year.

Dylan Patel

Yes, I understand what you mean, but I’m saying that no one in the supply chain really understands this. We are constantly being told that our numbers are too high, and when our numbers prove to be correct, they will say, "Oh, well, but your numbers for next year are still too high."

ASML's tools are mainly composed of four parts. It has a light source, manufactured by Cymer in San Diego. It has a mask stage, manufactured in Wilmington, Connecticut. It has a wafer stage. It has an optical system, such as lenses. The last two parts are manufactured in Europe

Look at each of these four parts; their supply chains are extremely complex. First, they have not attempted large-scale expansion; second, when they try to expand, the time delays are very long. Again, this is the most complex machine ever manufactured by humans, produced at any scale.

Let's talk specifically about the light source. What does the light source do? It drips tin droplets. Then it is perfectly bombarded three times with a laser. The first hit causes the tin droplet to spread out. When hit again, it expands into a perfect shape, and then it is bombarded with ultra-high power. The tin droplet gains enough energy to release EUV light, with a wavelength of 13.5 nanometers, and then this light is collected and directed to a lens group.

Then there is the lens group, manufactured by Carl Zeiss and some other vendors, but Zeiss is the most important part. They also have not attempted to expand production capacity because they do not see... they would say, "Because of AI, we have grown a lot. We grew from 60 to 100." But the reality is, "No, no, no. We need to grow to hundreds of units, but that's okay. Whatever."

Each of these tools, I think, has 18 such lenses, which are actually multilayer mirrors, perfectly stacked layer by layer with molybdenum and ruthenium (if I remember correctly), and then the light perfectly reflects off the top. When we think of lenses, they are shaped to focus light. It's like a mirror that is also a lens, so it's very complex. Any defect in these ultra-thin deposition layers will mess things up. Any curvature issue will mess things up.

There are many challenges to scaling up production. In a sense, it is quite artisanal because you are not producing tens of thousands of such lenses each year, but rather hundreds or thousands. With 60 tools each year, and 18 of these lenses per tool, you are still only dealing with hundreds, or for these lenses and projection optics, roughly on the order of thousands.

Then looking ahead, the mask stage, which is also some very crazy stuff. The movement speed of this thing, I want to say, is 9 Gs. It moves with 9G acceleration because when you step on the wafer, the tool moves... the stage is complementary, it is part of the wafer. You align these two things. You focus all the light through the lens; this is the mask, and this is the wafer. The mask moves in one direction, and the wafer moves in the opposite direction, while scanning an area of 26x33 millimeters on the wafer, and then it stops. It moves to another part of the wafer and scans again. This process is completed in a few seconds. Each is moving in opposite directions at 9G acceleration.

Each of these things is a miracle of chemistry, manufacturing, mechanical engineering, and optical engineering because you have to align all of these and ensure they are flawless. All of this involves a lot of metrology because you have to test everything perfectly. If anything goes wrong, the yield will drop to zero because this is such a finely tuned system.

By the way, it is huge; you need to build it in a factory in Eindhoven, Netherlands, then disassemble it, transport it to the customer site by multiple planes, and then reassemble and test it again This process takes many, many months.

There are so many links in the supply chain, whether it's Zeiss manufacturing their lenses and projection optical devices, or Cymer, a subsidiary of ASML, manufacturing EUV light sources. Each link has its own complex supply chain. ASML has commented that there are over ten thousand people in their supply chain.

Devakish Patel

Referring to individual suppliers?

Dylan Patel

Yes. Perhaps not directly. It could be that Zeiss has so many suppliers, and then some company has so many suppliers.

If you just think about it, you are discussing two physically moving objects, roughly the size of a wafer, and its precision must reach single-digit nanometers or even higher, because the entire system, the overlay error between layers, must be controlled to around 3 nanometers. If the overlay error is 3 nanometers, that means the precision of the physical movement of each individual component must be even lower than that value. In most cases, it must be below 1 nanometer, because these errors accumulate. There’s no way to just snap your fingers and increase production.

Simple things like electricity. The U.S. has gone from zero power growth to 2% power growth, which has been very difficult for the U.S. And this is a very simple supply chain, with very few people doing difficult things. The U.S. might have 100,000 electricians and people working in the power supply chain, or more?

When you look at ASML, they employ so few people. Carl Zeiss may have fewer than a thousand people working on this, and these people are super, super specialized. You can't just train anyone in the blink of an eye to do this. You can't just "galvanize" (metaphor for rapid mobilization) your entire supply chain immediately.

NVIDIA has done a lot of work to enable the entire supply chain to deliver the capacity they need to produce this year. When you talk to Anthropic, they will say, "We are missing TPUs, we are missing training compute, we are missing GPUs." When you talk to OpenAI, they will say, "We are missing these things."

OpenAI and Anthropic know they need X. NVIDIA is not as easily swayed by AGI. They are building X - 1. Down the supply chain, everyone is doing X - 1. In some cases, they are doing X ÷ 2 because they are not swayed by AGI.

Ultimately, the whip effect takes time to respond. The degree of being "swayed by AI" and the desire to increase production takes a long time to transmit. Once they finally understand the need to rapidly increase production... they think they understand. They think AI means we must increase from 60 to 100, plus tools are getting better and faster, light source power is increasing from 500 watts to 1000 watts, and all other aspects of the supply chain in terms of technological advancement and production increase. They think they are actually significantly increasing production

But if you calculate the numbers... what does Elon want? He wants to achieve 100 gigawatts in space by 2028 or 2029. Sam Altman wants to achieve 52 gigawatts per year by the end of this decade. Anthropic may need the same amount, and Google does too. When you look across the entire supply chain, you realize, wait, no, the supply chain cannot build what everyone wants in terms of computing power.

00:55:47 – Can't we just use TSMC's old fabs directly?

Devakish Patel

I think in the data center supply chain, people have been saying similar things over the past few years, "We're stuck on this specific thing, so AI computing power can't scale beyond X." But as you wrote, if the power grid is the bottleneck, then we can generate power on-site using gas turbines, etc. If that doesn't work, there are many other alternatives to rely on.

I want to ask, can we imagine a similar situation happening in the semiconductor supply chain? If EUV becomes the bottleneck, can we just go back to 7 nanometers? Look at 7 nanometer chips like the A100, from A100 to B100 or B200, there is obviously significant progress.

How much of that progress is simply due to optimization of numerical precision? If you keep the FP16 precision the same from A100 to B100. The computing power of B100 is slightly above 1 petaflop, while A100 is about 300 teraflops.

Dylan Patel

Yes, 312.

Devakish Patel

Keeping numerical precision constant, there is about a 3-fold improvement from A100 to B100. Part of it is process advancement, and part of it is improvements in the accelerator design itself, which can be replicated in the future.

It seems that the process improvement from 7nm to 4nm has a very small actual impact. I don't know the specific numbers, but let's assume 3nm has a capacity of 150,000 wafers per month, and eventually 2nm will have a similar amount. But 7nm also has a similar capacity.

If you have all these old wafers, maybe because the bit count per wafer is reduced by 50%, you need to apply a 50% discount, but enabling 7nm wafers to gain an additional 50 or 100 gigawatts doesn't seem so bad. Tell me why this idea is naive.

Dylan Patel

We might be crazy enough for this situation to actually happen because we just need more computing power, and the value of that computing power far exceeds the higher costs and power consumption of these chips. But to a large extent, this is also unlikely because some comparisons are unfair.

For example, from A100 (312 teraflops) to Blackwell (1000 or 2000 FP16), and then to Rubin (about 5000 FP16)... this is not a fair comparison because the design goals of these chips are vastly different In the A100 era, NVIDIA optimized for FP16 and BF16 numerical precision. By the time of Hopper, they were less concerned about these and focused on FP8. With Rubin, they no longer cared about FP16 and BF16, mainly focusing on FP4 and FP6. Numerical precision is a core consideration in their chip design.

Assuming we design a new chip based on modern numerical precision at 7nm, the performance gap will still be much larger than the FLOPS gap you mentioned. People often simplify it to FLOPS per watt or FLOPS per dollar, but that is not a fair comparison.

Look at Kimi K2.5 and DeepSeek. When you observe the performance of these two models running highly optimized software on Hopper versus Blackwell, you will get vastly different performance data. This is largely not due to FLOPS or numerical precision, as these models are actually 8-bit. So it's not that both Blackwell and Hopper are optimized for 8-bit, while Blackwell cannot leverage its 4-bit capability. The performance gap is actually much larger.

Of course, shrinking process technology to make transistors smaller and giving each chip X number of FLOPS is one thing, but you forget the main limiting factors. These models are not running on a single chip but are running simultaneously on hundreds of chips. Look at the production deployment of DeepSeek, which has been running for over a year on 160 GPUs. They use that many GPUs to handle production traffic. They split the model across 160 GPUs.

Every time you cross from one chip to another, there is an efficiency loss. You have to transmit through high-speed telecommunication signals SerDes, which incurs latency costs and power consumption costs. All these dynamic factors degrade performance. As you continue to shrink process nodes, the amount of computation within a single chip increases. Now the data movement speed within the chip is at least tens of TB per second, if not hundreds of TB. Between chips, the speed is about 1 TB per second.

Then, you need to move data between physically very close chips. You can only place so many chips in close physical proximity, so you have to place chips in different racks. The data movement speed between racks is several hundred Gbps, 400G or 800G, which is about 100 GB per second.

So you have this huge staircase: intra-chip communication is super fast, intra-rack is an order of magnitude slower, and inter-rack is another order of magnitude slower. As you break through the boundaries of the chip, you will suffer performance losses.

The reason I explain this is that when you compare Hopper and Blackwell, even if both use chips from the same rack, Hopper is significantly slower. In each domain, the performance you have relative to the task—communication between processing units at tens of TB per second compared to communication between those processing units at TB per second—is much higher, and thus the performance is much higher When you look at the inference of DeepSeek and Kimi K2.5 at 100 tokens per second, the performance gap between Hopper and Blackwell is about 20 times.

This is not the 2x or 3x performance difference implied by FLOPS, even though they are on the same process node. It’s simply due to differences in network technology and the different things they are studying. You can port some of this back, but when you look at what they are doing with Rubin on 3nm, some things are simply not feasible on A100, even if you build a new chip for 7nm.

Some architectural improvements can be ported, while others cannot. The performance gap is not just a difference in FLOPS. In a sense, it is the cumulative difference in FLOPS per chip, inter-chip network speed differences, the FLOPS ratio of a single chip to the entire system, and the memory bandwidth between a single chip and the entire system. All these factors add up.

Devaksh Patel

Can I ask a very naive question? The current B200 has two dies on one chip, so you can achieve that kind of bandwidth without going through NVLink or InfiniBand. Next year, Rubin Ultra will have four dies on one chip. What is stopping us from directly using the old... how many dies can fit on one chip while still achieving tens of TB of bandwidth per second?

Dylan Patel

Even within Blackwell, there are performance differences between intra-chip communication and inter-chip communication. These boundaries are obviously much smaller than leaving the entire chip. When you increase the number of chips, there will be some performance loss. It’s not perfect, but it’s much better than different entire packages.

How scalable is advanced packaging? The way Nvidia does it is CoWoS. Google, Broadcom, MediaTek, and Amazon's Trainium are all doing CoWoS. But in reality, you can look back at what Tesla did with Dojo, which they later canceled and then restarted. Dojo is a chip the size of an entire wafer. It has 25 chips on it. There are some trade-offs. They cannot put HBM on it. But the upside is that they have 25 chips on it. So far, it may still be the best chip for running convolutional neural networks. It just doesn’t perform well on transformers because the shape of the chip, memory, arithmetic, and all those specifications are not well suited for transformers. They are better suited for CNNs.

The Dojo chip is optimized around CNNs, and they made a larger package. But as you make the package larger, there will be other limitations: network speed, memory bandwidth, and cooling capacity. All these start to become apparent. It’s not simple. But yes, you will see a trend of more chips per package, and yes, you can do this on 7nm

In fact, Huawei is doing this with their Ascend 910C or D. They initially placed one, then two. They focused on scaling up the packaging because this is an area where they can make faster progress when process technology cannot shrink. But ultimately, what you can do at 7nm is likely also possible at 3nm packaging.

01:05:37 – When will Asia surpass the West in the semiconductor field?

Devakish Patel

If we end up in a world by 2030 where the West has the most advanced process technology but no large-scale capacity increase, and Asia... I don't know if you think they will have EUV and 2nm or something else by 2030. But they are very committed to semiconductors and are in mass production.

Basically, I want to know in which year there will be a crossover point where our advantage in process technology has diminished enough, and their advantage in scale has increased enough. Also, if they have the advantage of a fully localized supply chain—rather than relying on random suppliers from Germany and the Netherlands—does that mean Asia will lead in producing large-scale flops?

Dylan Patel

So far, Asia still does not have a fully localized semiconductor supply chain.

Devakish Patel

But will they have it by 2030?

Dylan Patel

By 2030, they could achieve that. The amount they can import from ASML is significant. But the vast majority of ASML's revenue, especially in EUV, is still on the Western side, along with Japan, Korea, etc.

Devakish Patel

But they are also trying to make their own DUV and EUV tools, right?

Dylan Patel

They are trying to do all these things. The question is how quickly they can make progress and scale up production while ensuring quality. So far, we haven't seen that. Right now, I am very optimistic that they will be able to do these things in the next five to ten years. They will really scale up production and operate at high speed. They have more engineers working on this and a greater willingness to invest money to solve this problem.

Devakish Patel

So by 2030, will they have fully localized DUV?

Dylan Patel

I definitely think so. DUV, yes.

Devakish Patel

What about fully localized EUV by 2030?

Dylan Patel

I think they will have usable tools. But I don't think they will be able to mass produce yet. Being operational and producing are two different things; the latter is the hell of mass production. ASML had usable EUV tools in the early 2010s, but they lacked precision and were not ready for large-scale production or were not reliable enough. They have to gradually increase yield, and that takes time

The hell of mass production takes time. That's why it takes five to seven years to go from the lab to large-scale production in the wafer fab.

Devaksh Patel

How many DUV tools do you think they can manufacture by 2030?

Dylan Patel

ASML?

Dylan Patel

That's a good question. It's a bit challenging to examine this supply chain, especially in Asia. We are working very hard. In some cases, they are purchasing things from Japanese suppliers. If they want a fully localized supply chain, they cannot buy these lenses, projection optics, or worktables from Japanese suppliers. They must manufacture them in-house.

It's really hard to say what level they can achieve. To be honest, I think it's a wild guess. But they might be able to produce about 100 DUV tools per year, while ASML currently produces hundreds of DUV tools each year.

No single company has a process node that produces one million wafers per month. Elon says he wants to achieve that, and a major Asian country will obviously strive to do so. TSMC in China is also trying to achieve this. Memory manufacturers might reach one million wafers per month, but not within a single fab.

Think about how incredible that scale is, and it's hard to see the supply chain mobilizing for it.

Dylan Patel

When you push the timeline that far, it really becomes challenging. We tend to focus on tracking every data center, every fab, and all the tools. We track where they are going, but the lag time for these things is relatively short. We can make fairly accurate estimates of data center capacity based on land purchases, permits, and turbine procurement. We know where these things are going, which is our sales data.

When you get into 2035, things will be so different. Your margin of error will become so large that it's hard to make estimates. But ultimately, if the takeoff or timeline is slow enough, I don't see why China can't catch up significantly. In a sense, we have a trough period, three to six months ago, or even now, where Chinese models are as competitive as before. I think Opus 4.6 and GPT 5.4 have really widened the gap a bit, but I believe new Chinese models will emerge.

As we shift from selling complete reasoning chain tokens to selling automated white-collar work—an automated software engineer, you send a request, and they return results, with a lot of invisible thinking behind it—extracting knowledge from American models to Chinese models will become more difficult.

Secondly, look at the scale of computing power these labs have. OpenAI had about 2 gigawatts at the end of last year. Anthropic will exceed 2 gigawatts this year. By the end of next year, they will all have a capacity of 10 gigawatts.

Also, look at all the capital expenditures spent on data centers. Amazon spent $200 billion, Google $180 billion. All these companies are spending hundreds of billions of dollars on capital expenditures This year, capital expenditures on data centers in the United States are close to one trillion dollars, roughly. What is the return on invested capital for these investments? You and I would think that the return on invested capital for data center capital expenditures is very high.

Look at Anthropic's revenue, which increased by $4 billion in January. In February, a shorter month, they increased by $6 billion. We will see what they can do in March and April, as computing power limitations are the bottleneck for their growth. The reliability of the cloud is quite low because they are too constrained by computing power. But if this situation continues, then the ROIC for these data centers will be very high.

At some point, due to all this capital expenditure, all the revenue generated by these models, and the downstream supply chain, the U.S. economy will begin to grow increasingly faster this year and next.

Look at Anthropic; their current ARR is $20 billion. At least according to the last report from The Information, the profit margin is below 50%. So that’s a leasing cost of $13 or $14 billion, which effectively means that someone has invested $50 billion in capital for Anthropic to generate their current revenue.

So in a sense, we are in a rapid takeoff phase. We are not talking about building a Dyson sphere before a certain date, but revenue is growing at such a speed that it is indeed impacting economic growth. The resources gathered in these labs are growing so quickly.

On the other hand, the returns on these infrastructure investments may be mediocre. Perhaps not as good as hoped. Maybe Google is wrong to aim to bring free cash flow down to zero and spend $300 billion on capital expenditures next year. Perhaps they are just wrong, and those bears on Wall Street who don’t understand AI are right. In this case, the U.S. has built so much capacity but hasn’t received good returns. Meanwhile, China is able to establish a fully vertically integrated, domestically produced supply chain, rather than building this lower vertically integrated supply chain together like the U.S./Japan/Korea/Southeast Asia/Europe. In a sense, if the time required for AI to reach certain capability levels is longer than most guests on your podcast believe, then China could ultimately surpass us.

Devaksh Patel

So it is: short timeline, the U.S. wins; long timeline, China wins.

Dylan Patel

Yes, but I don’t know what "short timeline" means. I don’t think you have to believe in AGI to have a timeline where the U.S. wins.

01:16:01 – The Upcoming Huge Memory Crisis

Devaksh Patel

Let’s return to the topic of memory. I think people on Wall Street and in the industry are starting to understand how important this is, but perhaps the general public still doesn’t realize how big of a deal this is. So we are facing this memory crisis, as you mentioned.

Earlier I asked, oh, can we solve the EUV tool shortage by going back to 7 nanometers? So let me ask a similar question about memory HBM is made from DRAM, but the number of bits per wafer is three to four times less than the DRAM it uses.

Is it possible for future accelerators to use regular DRAM instead of HBM, allowing us to gain more capacity from existing DRAM? I think this might be possible because if we want to have autonomous intelligent agents instead of synchronous chatbot applications, you might not necessarily need extremely low latency.

Perhaps you could accept lower bandwidth since stacking DRAM into HBM is for higher bandwidth. Could we be moving towards the opposite of HBM accelerators, essentially like the opposite of "Crowd Code Fast," creating a "Crowd Slow"?

Dylan Patel

Ultimately, the incremental buyers willing to pay the highest price for tokens are also the buyers who are less sensitive to price. In a capitalist society, computing resources should be allocated to the goods of highest value, and the private market determines this through willingness to pay.

To some extent, Anthropic could actually release a slow mode. They could launch a "Crowd Slow Mode," significantly increasing the number of tokens per dollar. They might be able to reduce the price of Opus 4.6 by 4-5 times, while the speed might only decrease by 2 times. The curve of reasoning throughput versus speed based on HBM already exists. However, they don’t do this because, in reality, no one wants to use a slow model.

Moreover, while it’s great for models to run on an hourly basis for these agent tasks, if the model runs slowly, those hours can turn into a day. Conversely, if the model runs quickly, those hours can turn into an hour. No one really wants to wait a day because the highest value tasks also have a certain time sensitivity.

I find it hard to imagine... yes, you could use regular DRAM. But there are several challenges. One core limitation of chips is that they have a certain size, and all the input/output (I/O) is routed at the edges. Typically, HBM is on the left and right sides of the chip—so the I/O from the chip to HBM is on the sides—and then the top and bottom are for I/O to other chips.

If you switch from HBM to DDR, suddenly, the edge I/O bandwidth would significantly decrease, but the capacity per chip would significantly increase. However, the metric you really care about is the bandwidth per wafer, not the number of bits per wafer.

Devaksh Patel

Because what limits FLOPS is the speed of entering and exiting the next matrix, for which you just need more bandwidth.

Dylan Patel

Yes, reading weights, as well as reading and writing KV caches. In many cases, these GPUs are not fully utilizing memory capacity. This is clearly a system design issue: the model, hardware, and software need to be co-designed. You have to calculate how much KV cache is needed, how much stays on the chip, how much is offloaded to other chips and called when needed (for tool calls), and how many chips are used in parallel

Clearly, the search space is very broad, which is why we have InferenceX, an open-source model that can search for optimal configuration points for different chips and models across various inference tasks.

The key is that you are not always limited by memory capacity. You may be constrained by FLOPS, network bandwidth, memory bandwidth, or memory capacity. Simplifying it, there are these four constraints, each of which can be further subdivided.

If you switch to DDR, yes, each DRAM wafer can produce 4 times the number of bits, but suddenly the constraints change significantly, and your system design also needs to change. You slow down. Has the market shrunk? But at the same time, all these FLOPS are wasted because they are just waiting for memory. You don't need that much capacity because you can't really increase the batch size; otherwise, the time to read the KV cache will be longer.

Devaksh Patel

That makes sense. What is the bandwidth difference between HBM and regular DRAM?

Dylan Patel

One HBM4 stack—let's talk about what's inside Rubin since we've been benchmarking it—has a 2048-bit width, connected in an area about 13 millimeters wide. Its memory transfer rate is about 10 Giga-transfers per second.

So one HBM4 stack occupies about 11 to 13 millimeters of shoreline on the chip. On this shoreline, there are 2048 bits transferring at a rate of 10 Giga-transfers per second. Multiply them together, divide by 8 (bits to bytes), and you get about 2.5 TB of bandwidth per HBM stack per second.

When you look at DDR, on the same area of shoreline, it might be 64 or 128 bits wide. That DDR5 transfer rate is between 6.4 to 8,000 Giga-transfers per second. So your bandwidth is much lower. 64 times 8000 divided by 8 gives you 64 GB per second. Even with a loose estimate, 128 times 8 Giga-transfers, for the same shoreline, you only have 128 GB per second, while HBM is 2.5 TB.

There is an order of magnitude difference in bandwidth per unit edge area. If your chip is square or sized 26x33 millimeters—this is the maximum size for a single die—you only have so much edge area. Inside the chip, you place all the computing units. You can try to change this by adding SRAM or cache. But ultimately, you will be greatly limited by bandwidth.

Devaksh Patel

So the question is, where can you destroy demand to free up enough resources for AI? I think the situation is particularly dire because, as you said, if HBM requires 4 times the wafer area to get the same number of bytes, then to free up one byte for AI, you have to destroy four times the consumption demand from laptops and phones

What does this mean for the next year or two? Sorry for the long question, but in your newsletter, you mentioned that 30% of the capital expenditure of large tech companies will be allocated to memory by 2026?

Dylan Patel

Yes.

Devaksh Patel

That's crazy, right? 30% of $600 billion or whatever number is just for memory.

Dylan Patel

Yes. Obviously, NVIDIA will add some profit, so you have to separate NVIDIA's profits from memory and logic. But ultimately, a third of their capital expenditure will go to memory.

Devaksh Patel

That's insane. With the memory crisis approaching, what should we expect in the next year or two?

Dylan Patel

The memory crisis will continue to worsen, and prices will keep rising. This will affect different parts of the market in various ways. Will people start to dislike AI more? Yes, because smartphones and PCs won't get better year after year. In fact, they will get worse year after year.

Devaksh Patel

If you look at the bill of materials for an iPhone, what percentage does memory account for? If memory prices double, how much more expensive will the iPhone be?

Dylan Patel

I believe an iPhone has 12GB of memory. In the past, it was about $3-4 per GB, which is $50. But now memory prices have tripled. Assuming it's $12 per GB, then you're talking about $150 compared to $50.

That's a $100 cost increase for Apple. Apple has a certain profit margin, and they won't just absorb this profit. NAND has the same market dynamics, so in reality, the cost of the iPhone could increase by $150. So that's a $100 cost increase, and that's just for DRAM. NAND has the same market situation. So in fact, the cost of the iPhone could increase by $150. Apple will either pass this on to consumers or absorb it themselves. I don't think Apple will significantly lower their profit margins; maybe they will absorb a bit. But ultimately, this means that the end consumer will have to pay an additional $250 for an iPhone.

This is just based on comparing last year's prices to today's prices. Apple will feel the pressure with some delay because they tend to sign long-term memory contracts for several quarters to a year. But ultimately, Apple will be significantly impacted. They will wait until the next generation iPhone is released to make adjustments.

But that's the high end of the market, with only a few hundred million phones sold each year. Apple sells about 200 to 300 million phones annually. The bulk of the market is in the mid to low end. Previously, 1.4 billion smartphones were sold each year. Now it's about 1.1 billion. Our forecast is that this year it may drop to 800 million, and next year to 500 or 600 million

Yes, for a $1,000 iPhone, the bill of materials only increases by $150, and Apple has a large profit margin. But for cheaper phones, memory and storage account for a much larger percentage of the bill of materials. And the profit margins are lower, so there isn't much ability to absorb the profit margin. Moreover, they typically do not sign long-term agreements for memory.

This is significant because if smartphone sales were to halve, this decline would occur in the mid to low-end market, not the high-end. So it's not that the released bits have halved. Currently, consumer devices account for more than half of memory demand. Even if you halve smartphone sales, due to the shape of the decline, the low-end is cut by more than half, while the high-end is cut less because you and I will still buy those high-end phones priced over $1,000. Even if they are slightly more expensive, we will buy them. Apple's sales will not decline as much as those of low-end smartphone suppliers.

The same goes for PCs. The impact on the market is quite significant. DRAM is released and flows to those willing to sign long-term contracts and pay higher profit margins for AI chips, because ultimately they extract much larger profits from end users.

This could lead to increased resentment towards AI. Today, you can already see all the memes on PC subreddits and gaming PC Twitter. There are videos of dancing cats titled "This is why memory prices have doubled, and you can't buy a new gaming GPU or desktop." It will be worse when memory prices double again, especially for DRAM.

Another interesting dynamic is that it's not just DRAM, but also NAND. NAND prices are also rising. Both markets have seen very slow capacity expansion over the past few years, with NAND almost at zero. The proportion of NAND used for phones and PCs is higher than that of DRAM used for phones and PCs.

When you destroy demand, primarily for DRAM purposes, you release more NAND that can be allocated to other markets. The price increase for DRAM will exceed that of NAND because you are releasing more resources from the consumer side, and in fact, you are producing more memory for AI.

Devaksh Patel

Sorry, maybe you just explained this and I missed it. Is it because data centers use SSDs extensively?

Dylan Patel

Yes, but the usage is not as large as DRAM.

Devaksh Patel

Okay, so their prices will also rise because some will be used, but the demand for HBM is not as urgent. That makes sense.

One thing I didn't realize before reading your newsletter is that the constraints limiting logic expansion in the coming years are very similar to the factors limiting our ability to produce more memory wafers. In fact, it's the same machine, the EUV tool, that is also required for memory production. So I think someone might now ask, why can't we produce more memory?

Dylan Patel

As I mentioned earlier, the limiting factor today or next year is not necessarily the EUV tools. They will become a limiting factor later in this decade. Currently, the limiting factor is more that they simply haven't built fabs. Over the past three to four years, these memory manufacturers have not built new fabs because memory prices have been very low. Their profit margins are very thin, and in fact, they are losing money on memory in 2023. So they decided not to build new fabs. The market has been slowly recovering over time, but it really only got better last year.

In 2024, we have been beating the drum that reasoning means long context, long context means large KV cache, which means you need a lot of memory demand. We have been talking about this for a year and a half, two years. Those who understand AI were buying memory heavily at that time. So you have seen that dynamic, but now it is finally reflected in the prices.

It took a long time, even though long context means larger KV caches and more memory is needed, which is obvious. Half the cost of accelerators is memory. They will certainly start investing heavily. It took a full year for this to really reflect in memory prices. Once memory prices reflected this, memory manufacturers took another three to six months to start building fabs. These fabs take two years to build. So we won't have really decent fabs to place these tools until the end of 2027 or 2028.

On the contrary, you see some quite crazy things happening to gain capacity. Micron bought a fab from a Taiwanese company that produces lagging chips. Hynix and Samsung are doing some pretty crazy things trying to expand capacity in existing fabs, which will also have huge ripple effects in the economy.

So why can't we build more capacity? There is no place to put the tools. It's not just EUV; there are other tools in DRAM and logic manufacturing. In logic, for N3, about 28% of the final wafer cost is EUV. When you look at DRAM, it's in the teens. It's rising, but it represents a much smaller proportion of the cost. These other tools are also bottlenecks, although their supply chains are not as complex as ASML's.

You will see Applied Materials, Lam Research, and all the other companies also aggressively expanding capacity. But there is no place to put the tools because the most complex buildings humans build are fabs, and fabs take two years to construct.

Devakish Patel

I recently interviewed Elon, and his whole plan is that they want to build this TeraFab, and they want to build clean rooms. I won't even ask you about dirty rooms, but let's assume they built the clean rooms.

I have a few questions. First, do you think this is something Elon’s company can build much faster than traditional methods? This is not about manufacturing the final tools. This is just about building the facilities themselves. How complex is it to build clean rooms at an extremely fast pace? Is this something that Elon and his "fast action" approach can do much faster if this is the bottleneck this year or next year? Second, if, as you said, two years from now our bottleneck is no longer clean room space but tools, does that still matter?

Dylan Patel

Like any complex supply chain, it takes time, and the constraints change over time. Just because something is no longer a constraint doesn't mean that market is no longer profitable. For example, energy may not be a major bottleneck in a few years, but that doesn't mean energy won't grow rapidly and have profit margins. It's just that it's no longer the key bottleneck. In the foundry space, clean rooms are the biggest bottleneck this year and next. As we move into 2028, 2029, and 2030, there will still be constraints there.

Regarding Elon, he has a tremendous ability to acquire physical resources and very smart people to build things. The way he recruits top talent is by trying to build the craziest things. In the case of AI, this hasn't really worked because everyone is trying to build AGI. Everyone is very ambitious. But in the case of going to Mars, building rockets that can land themselves, fully autonomous vehicles, or humanoid robots, these are ways to recruit those who believe these are the most important problems in the world to work on, because he is the only one really trying hard.

In the semiconductor space, he has stated he wants to build a fab that produces one million wafers a month. No fab is that large. He has the potential to recruit a lot of really great people to take on this crazy task of producing one million wafers a month. The first step is to build clean rooms, and I think he is quite likely to achieve that. His idea of eliminating unnecessary things and that it can be a bit dirty is probably wrong. In fact, I think it's 100% wrong. Fabs need to be very clean. The air in a fab is changed every three seconds, that quickly. There must be very few particles.

But I think he can build clean rooms. It will take a year or two. Initially, it won't be super fast, but over time, he will get faster. The really complex part is developing process technology and manufacturing wafers. I don't think he can do that quickly. It requires a massive accumulation of knowledge. The most complex work, which integrates very expensive tools and supply chains, is done by TSMC, Intel, or Samsung. The other two companies aren't even good at this, and it's already extremely complex.

Devaksh Patel

If in 2030, a disruptive technology happens to emerge, and we no longer use EUV, how surprised would you be? If what we use works better, is simpler to produce, and can be produced at a larger scale? I'm sure as an industry insider, this sounds like a completely naive question, but do you understand what I'm asking? What probability should we assign to something completely unexpected that makes all of this irrelevant?

Dylan Patel

For something very simple and easy to scale, I assign a very, very low probability. There are quite a few companies researching effective particle accelerators or synchrotrons that produce light either at 13.5 nanometers (like EUV) or narrower wavelengths, like 7 nanometer X-rays, which are then used for lithography tools. But these things are huge particle accelerators that produce this light. Building them is very complex

Several companies are working on this, and I think it could become a major disruption in the industry beyond EUV. But I don't think we will magically build something new, direct-write, super simple, and mass-manufacturable, although there are some attempts being made in this area.

Devakish Patel

I ask because if you think about Elon’s past companies, rocket technology was once— and still is—considered extremely complex.

Dylan Patel

Listen, compared to Elon, I'm just a naive chatterbox. What have I built? So maybe it is possible.

Devakish Patel

To manufacture more memory in the future, can we do 3D DRAM like we do 3D NAND, and then go back to DUV?

Dylan Patel

That is the hope at the moment. Everyone's roadmap for 3D DRAM is that you will still use EUV because you want tighter overlay accuracy. When you do these subsequent processing steps, everything is vertically stacked, and you have more layers stacked together. You want the spacing to be tighter. So overall, people are still trying to do it with EUV.

But what 3D does is change the calculation of the number of bits that a single EUV process can produce. If you switch to 3D DRAM, that number will increase dramatically. That is where the hope lies. Currently, everyone's roadmap is from the current 6F cell to the 4F cell, and finally to 3D DRAM by the end of this decade or the beginning of the next decade. There is still a lot of R&D, manufacturing, and integration work to be done. I wouldn't say it's impossible. I think it is very likely to happen.

This will also require a massive reorganization of the fabs. The tool composition in the fabs will be very different. Lithography tools are actually the only things that don't change much. But their numbers, relative to the different types of chemical vapor deposition, atomic layer deposition, dry etching, or various etching chambers with different chemical properties... for different process nodes, you have all these different tools. You can't quickly convert a logic fab into a DRAM fab, or vice versa, or convert a NAND fab into a DRAM fab.

Similarly, existing DRAM fabs need a lot of reorganization just to move from 1-alpha to 1-beta to 1-gamma process nodes because they have to increase DUV and change the deposition and etching chemical material stacks when using EUV. And there must be EUV tools. Additionally, when you switch to 3D DRAM, there will be even greater transformations, so these fabs will need a lot of reorganization.

That would be a huge disruption. It would generally reduce the demand for EUV. But as we have seen for a long time, the proportion of lithography demand in wafer costs has been increasing. Around 2014, it accounted for 17% of wafer costs, and over the past fifteen years, it has risen to 30%. For DRAM, it is in the low to mid-teens percentage, now trending towards the high teens percentage. Before we reach 3D DRAM, it may cross the 20% range But then, if we reach 3D DRAM, the proportion of EUV in the final wafer cost will decline again.

Devakish Patel

I think you are more concerned about how it constrains production rather than the cost percentage.

Dylan Patel

Right, but the cost percentage—

Devakish Patel

It’s a proxy metric, yes. If you are Jensen Huang or Sam Altman, or others who can greatly benefit from the scaling of AI computing power, they might go to TSMC and say, "Why can’t we get Y and Z?" But I think your point is that, in a sense, what TSMC does doesn’t matter. In fact, even if Intel and Samsung build more fabs, in the long run, you will still be constrained by ASML and other tool and material manufacturers.

First, is that understanding correct? Second, should people in Silicon Valley now go to the Netherlands and try to persuade ASML to make more tools to gain more AI computing power by 2030?

Dylan Patel

We are seeing an interesting dynamic in 2023, 2024, and 2025. Those who saw the energy bottleneck earlier than others asymmetrically went to Siemens, Mitsubishi, and of course GE Vernova, to buy up turbine capacity. Now they can deploy these turbines at a high price due to energy issues.

Similarly, this could happen with EUV, except ASML won’t easily believe any random fool wanting to buy EUV tools. These turbines are much cheaper than EUV tools, and the production quantities are much larger. Especially when you get into industrial gas turbines, not just combined cycle, but also cheaper, smaller, and less efficient models, people will put down deposits for them.

Someone could do this. Someone should go to the Netherlands and say, "I’ll give you a billion dollars. You give me the right to purchase ten EUV tools two years from now, and I want to be first in line." Then for the next two years, you walk around waiting for everyone to realize, "Oh no, I don’t have enough EUV tools," and then you try to sell your options at a premium. What you are doing is simply saying, "ASML, you are really foolish. You haven’t made enough profit on these tools. I want to make that profit." The question is, would ASML agree? I don’t think so.

Devakish Patel

One possibility is that they could at least get demand signals from this to increase production.

Dylan Patel

It’s possible. I agree.

Devakish Patel

But it sounds like you are saying that even if they want to increase production, they can’t due to the supply chain.

Dylan Patel

Right. But that’s exactly where the market is… If they can’t increase production, just like TSMC can’t ramp up production that quickly, while demand is skyrocketing, then the obvious solution is arbitrage. You and I know that demand is far higher than they forecast and what they can build capacity for

You can arbitrage by locking in production capacity, making forward contracts, and then trying to sell at a high price when others realize everything is doomed and we don't have enough capacity. Then you will gain the super high profits that ASML and TSMC should have received. But the problem is, I don't know if ASML and TSMC would agree to do this.

01:42:34 – Expanding power supply in the U.S. will not be a problem

Devakish Patel

Now let me ask you about power. It sounds like you think power can be expanded at will.

Dylan Patel

Not at will, but it can be.

Devakish Patel

But it can exceed those numbers. If I remember correctly, your blog post about how AI labs can increase power supply suggests that GE Vernova, Mitsubishi, and Siemens can produce 60 gigawatts of gas turbines each year. Then there are other sources, but they are not as important as turbines.

Only a portion of that will be used for AI, I guess. If by 2030 we have enough logic and memory to do 200 gigawatts a year, do you think these things will somehow increase to over 200 gigawatts a year, or what are your thoughts?

Dylan Patel

Right now we are at 20 or 30. By the way, this is critical IT capacity, which is an important point to mention. When I talk about these gigawatts, I mean critical IT capacity. That is, the power consumed after servers are plugged in. But there are losses throughout the chain. Transmission, conversion, cooling, etc., all have losses. So you should multiply this year's 20 gigawatts, or the 200 gigawatts at the end of this decade, by a factor of 20-30% to get the actual total power needed.

Then there is the capacity factor. Turbines do not run 100% of the time. If you look at PJM, I believe it is the largest power grid in the U.S.—covering the Midwest and parts of the Northeast—in their model, they want about 20% of surplus capacity. Within this 20% surplus capacity, all turbines operate at 90% capacity because they are derated for reliability, maintenance, etc. In reality, the nameplate capacity of energy is always much higher than the final critical IT capacity due to all these factors.

But it's not just turbines. If you only generate power with turbines, that's simple, boring, and easy. Humans and capitalism are much more effective. The whole point of that blog post is that, yes, only three companies manufacture combined cycle gas turbines, but there is much more we can do. We can use aircraft engines. We can turn airplane engines into turbines. There are even new entrants in the market, like Boom Supersonic trying to do this and collaborating with Crusoe. There are also other similar products already on the market.

And there are medium-speed reciprocating engines: engines that rotate like diesel engines. There are ten companies that manufacture such engines. I come from Georgia, and people used to say, "Oh man, your RAM truck has a Cummins engine in it." "The automotive manufacturing industry is declining, so these companies have capacity that can be adjusted for powering data centers. You can put all these reciprocating engines in. It's not as clean as combined cycle, but if you're willing, you can convert them from diesel to gas.

What about marine engines? All these engines used for large cargo ships are great. Nebius is doing this for a Microsoft data center in New Jersey. They generate power using marine engines. Bloom Energy is working on fuel cells. We have been very optimistic about them for a year and a half because their ability to increase production is very strong. Even if the costs are slightly higher than combined cycle (which is the best in terms of cost and efficiency), their payback period for increasing production is very fast.

There are also solar power with batteries, and as the cost curve continues to decline, they can come online. And wind energy, you might only be able to reach 15% of maximum power because the wind fluctuates, but you can add batteries. There are all these options.

Another thing is that the grid is sized to not cut power during the peak of the hottest summer days. But in reality, that's a load peak that is 10-20% higher than average. If you just install enough utility-scale batteries, or have peaking plants that only run a small portion of the year—those can be gas, industrial gas turbines, combined cycle, batteries, or any other sources I mentioned—then suddenly, you have released 20% of the capacity of the U.S. grid for data centers. Most of the time, this capacity is idle. It's just for a few hours of peak load on those few days of the year. If you have enough capacity to absorb that peak load, then suddenly you've shifted it all.

Today, data centers only account for 3-4% of the electricity of the U.S. grid, and by 2028, it will reach 10%. But if you can release 20% of the capacity of the U.S. grid like this, that's not crazy. The U.S. grid is in the terawatt range, not the hundreds of gigawatts range. So we can add more energy.

I'm not saying this is easy. These things will be difficult. There are many tough engineering problems, people have to take risks, and new technologies must be used. But Elon was the first to do this self-supplied gas power generation, and since then, we've seen a surge in various things people are doing to get power. They are not easy, but people will be able to do it. The supply chain is much simpler than chips.

Devaksh Patel

Interesting. He mentioned in the interview that for the specific blades of the specific turbine he is looking at, the delivery time has been pushed to after 2030. Your point is—

Dylan Patel

That's fine. There are many other ways to generate energy. It's just less efficient, that's okay.

Devaksh Patel

Currently, the capital expenditure for combined cycle gas turbines is $1,500 per kilowatt. Are you saying that using technology that is much more expensive than this, or that other things become cheap enough to make it competitive, is reasonable?

Dylan Patel

Absolutely correct. It could be as high as $3,500 per kilowatt. That could be twice the cost of combined cycle, while the TCO of GPUs only increases by a few cents per hour.

Because we've been talking about Hopper pricing at $1.40, assuming electricity prices double. Then the cost of Hopper goes from $1.40 to $1.50. I don't care because the model improvements are happening so quickly that their marginal utility far exceeds that 10-cent increase in energy costs.

Devakish Patel

So you're saying that 20% of the grid—about 1 terawatt—could come online through utility-scale batteries, increasing the capacity you're willing to put on the grid, and so on.

Dylan Patel

The regulatory mechanisms there are not easy, by the way.

Devakish Patel

But if that assumption holds, that's 200 gigawatts. Just from the different natural gas generation sources you mentioned—various engines and turbines—how many gigawatts can they collectively release by the end of this decade?

Dylan Patel

We're tracking this in our data. There are over 16 different manufacturers of natural gas generation equipment. Yes, there are only three companies that manufacture combined cycle turbines, but we're tracking 16 different suppliers, and we have all their orders. It turns out there are hundreds of gigawatts of orders flowing to various data centers.

As we approach the end of this decade, we believe that about half of the new capacity will be from self-generation plants. Self-generation is almost always more expensive than grid connection, but there are many issues with grid connection: permits, interconnection queues, and so on. So even though it's more expensive, people still choose self-generation.

They self-generate in various ways. It could be reciprocating engines, marine engines, or aviation engines. It could be combined cycle, although combined cycle is less suited for self-generation. It could be Bloom Energy's fuel cells, or solar with batteries. It could be any of these.

Devakish Patel

And you're saying any one of these could individually do tens of gigawatts?

Dylan Patel

Any one of these could individually do tens of gigawatts, and as a whole, they could do hundreds of gigawatts.

Devakish Patel

Okay. So just this alone should be enough—

Dylan Patel

Electricians' wages could double or triple again. There will be a lot of people entering this field, a lot of people making money, but I don't think that's the main bottleneck.

Devakish Patel

Right now in Abilene, Crusoe is building a 1.2 gigawatt data center for OpenAI, and I think they have 5,000 people working there, or at peak times they do. If you turn that into 100 gigawatts—and I'm sure things will become more efficient over time—that would require 400,000 people to build 100 gigawatts

If you think about the workforce in the United States, how many electricians are there, how many construction workers... I think there are 800,000 electricians. I don't know if they can all be replaced in this way. There are millions of construction workers. But if we are in a world that adds 200 gigawatts every year, will we eventually be limited by the workforce? Or do you think this is not actually a real constraint?

Dylan Patel

The workforce is a significant constraint. It is a huge constraint. People need to be trained. Similarly, we may start importing high-skilled labor. A highly skilled electrician involved in decommissioning power plants in Europe now comes to the U.S. to build facilities that deliver high-voltage electricity to data centers, which makes sense.

Humanoid robots or at least robotic technology may start to help, but the main factor in reducing the number of people will be modularization and manufacturing in factories in Asia. Unfortunately, for the U.S., countries like South Korea and Southeast Asia will increasingly ship prefabricated data center modules. These will be brought in. Now, you typically bring in servers or a rack and then connect it to different components that come from different places.

But now, you would ship it to a factory and integrate the whole thing. Maybe it's a 2-megawatt module that converts high-voltage AC power to the DC voltage you deliver to the racks, or something similar. Or for cooling, you bring in a fully integrated unit that has many cooling subsystems already assembled because plumbing is also a significant constraint.

Additionally, instead of individual racks (which require people to connect all these racks with cables), you can use a sled that places an entire row of servers on it, shipped directly from the factory. Today, a rack might be 120 or 140 kilowatts, but as we move into the next generation of NVIDIA Kyber and similar technologies, it’s almost 1 megawatt.

Furthermore, if you do an entire row, it will include racks, networking, cooling, and power, all integrated together. Now when you come on-site, the amount of cabling you need to lay is much less. There are fewer network fibers, fewer power connections, and fewer plumbing connections. This can significantly reduce the number of people working in data centers, so our capacity to build them will be much greater.

In the process, some will transition to new things faster, while others will be slower. Crusoe and Google have been talking about this modularization, and Meta and many other companies are as well. Those who transition to new things faster may encounter delays, while those who are slower will face workforce issues. There will always be misalignments in the market because it is a very complex supply chain. Ultimately, it is still simple enough that we will be able to solve it within the required time frame through capitalism and human ingenuity.

01:54:44 – Space GPUs are unlikely to be realized within this decade

Devakish Patel

Speaking of major issues to be solved, Elon Musk is very optimistic about space GPUs. If you are right, electricity on Earth is not a problem... I think they have other reasons to believe this makes sense, even if there will be enough gas turbines or other things on Earth, Elon’s next argument is that you cannot get permits to build hundreds of gigawatts on Earth Do you accept this argument?

Dylan Patel

From a land perspective, the U.S. is vast. Data centers actually don't take up that much space, so you can solve that issue. From a permitting standpoint, air pollution permits are a challenge, but the Trump administration made it much easier. You go to Texas, and you can skip a lot of that red tape.

Elon had to deal with a lot of that complexity in Memphis, and then built a power plant across the border for Colossus 1 and 2. But ultimately, in central Texas, there are many more things you can get away with.

Devakish Patel

Given that Elon lives in Texas, why doesn't he go to Texas?

Dylan Patel

I think part of the reason is that they were overly reliant on grid power at some point. That was just something they thought they needed more of at the time.

Devakish Patel

Because there is an aluminum refinery connected to the grid there.

Dylan Patel

It's actually an idle appliance factory. But I think they may have valued grid power, water resources, and natural gas resources more. I think they knew when they bought that place that the gas pipeline was right there, and they intended to tap into it. Water resources too. There are a bunch of different limiting factors. It might also be an easier place to find electricians.

Ultimately, I don't know exactly why they chose that location. I bet if Elon could choose again, considering the regulatory challenges he faces, he would choose somewhere in Texas. In the end, permitting is a challenge, but there are 50 states in the U.S., there's plenty of space, and things can always get done.

There are many small jurisdictions where you can temporarily bring in all the workers you need for three to twelve months, depending on the contract. You can house them in temporary accommodations and pay high wages because labor is very cheap compared to the value of GPUs, networks, and the tokens that will ultimately be generated. So there's enough room to pay for all of that.

Additionally, people are diversifying now. Australia, Malaysia, Indonesia, and India are all places where data center construction is happening faster. But currently, over 70% of AI data centers are still in the U.S., and that trend continues. People are figuring out how to build these things. Ultimately, dealing with permits and red tape in remote areas of Texas, Wyoming, or New Mexico might be much easier than sending things into space.

Devakish Patel

Besides considering that energy is only a small part of the total ownership cost of data centers, which makes the economic argument for space data centers less compelling, what other reasons do you have for being skeptical?

Dylan Patel

Obviously, electricity in space is basically free.

Devakish Patel

That's why you do it.

Dylan Patel

Yes, that's the reason. But there are all the other counterarguments. Even if the cost of electricity on Earth doubles, it's still a small part of the total cost of GPUs

The main challenge is... we have ClusterMAX, which rates all new cloud companies. We tested over 40 cloud companies, including hyperscale cloud providers and new cloud companies. Aside from software, the biggest difference among these cloud providers is their ability to deploy and manage failures.

The reliability of GPUs is very poor. Even today, about 15% of deployed Blackwell units need to be returned (RMA). You have to take them out. Sometimes you just need to reseat them, but other times you have to remove them and send them back to NVIDIA or their RMA partners.

Devakish Patel

What do you think of Elon’s argument that after the initial phase, they actually don’t fail that much?

Dylan Patel

Sure, but now you’ve done all this, tested them all, disassembled, put them on a spacecraft, launched them into space, and then brought them online. That takes months. If your argument is that GPUs have a five-year lifespan, and this takes an additional six months, that’s equivalent to 10% of your cluster's lifespan.

Because we are so constrained by computing power, this power is theoretically most valuable in the first six months. We are more constrained now than we will be in the future. This computing power can help build better models in the future or generate revenue today, allowing you to raise more funds. All of this makes now the most critical moment, but you might have delayed the deployment of computing power by six months.

What distinguishes these cloud providers is... we see some cloud providers taking six months to deploy GPUs on Earth. We see some cloud providers taking far less than six months. So the question is, where does space deployment rank? I can’t see how you could possibly test them all on Earth, disassemble them, and then transport them to space without spending much longer than keeping them in the testing facility.

Devakish Patel

The question I want to ask is about the topology of space communication. Right now, Starlink satellites communicate at 100 Gbps. You can imagine that with optimized optical inter-satellite laser links, this speed could be much higher. This is actually very close to the bandwidth of InfiniBand, which is 400 GB/s.

Dylan Patel

But that’s per GPU, not per rack. So that needs to be multiplied by 72. Also, that’s Hopper. When you get to Blackwell and Rubin, that number will double and double again.

Devakish Patel

But during inference, how much computation is... still collaborating between different scale-up domains, or is inference just being done as a batch within a single scale-up domain?

Dylan Patel

Many models can fit within a single scale-up domain, but many times you will split them across multiple scale-up domains

As models become increasingly sparse (which is a general trend), you want each GPU to query only a few experts. If today's leading models have hundreds or even thousands of experts, then you might want to run this model on hundreds or thousands of chips, even as we move into the future.

So ultimately, you will encounter the problem of needing to connect all these satellites for communication.

Devaksh Patel

That would be difficult. If there were a world where you could perform inference on a batch of requests within a single scale-up domain, then it might be more feasible. But if not, that's another story.

Dylan Patel

Networking these chips is a problem; you can't make satellites infinitely large. There are many physical challenges that make satellites very large. That's why you need these interconnections between satellites.

These interconnections are more expensive. In a cluster, 15-20% of the cost is networking. Suddenly, you're using space lasers instead of simple lasers manufactured at millions of units with pluggable transceivers.

And these things are also very unreliable, by the way, more unreliable than GPUs. Throughout the lifecycle of the cluster, you often need to unplug them and clean them. You need to unplug and replug them for various random reasons. These things are just not that reliable. So you also encounter that problem. You have a more expensive and complex space laser for communication instead of these mass-produced pluggable optical transceivers.

Devaksh Patel

So overall, what does this mean for space data centers?

Dylan Patel

Space data centers are actually not limited by their energy advantages. They are constrained by the same limited resources. By the end of this decade, we can only manufacture two hundred gigawatts of chips per year. What can we do to obtain those two hundred gigawatts? It doesn't matter whether on land or in space. Because you can build that kind of power. Human capacity and capability can evolve to a stage where we increase various types of power to terawatt levels globally each year.

At some point, we will indeed cross the chasm and make space data centers meaningful, but not within a decade. That is much further away, once energy constraints truly become a major bottleneck, and land permits become a bigger bottleneck as it occupies more economic share. And crucially, once chips are no longer a bottleneck.

Right now, chips are the biggest bottleneck. You want them to be deployed and used for AI as soon as they are manufactured. People are doing a lot to speed up this process. They are modularizing data centers, even modularizing racks, where you just put the chips into the data center, and everything else is already wired and ready to go. People are doing things like this to shorten this time, and you can't do that in space.

Ultimately, in a chip-constrained world, what matters most is to get these chips to generate tokens as quickly as possible. Perhaps by 2035, the semiconductor industry, ASML, Zeiss, and suppliers like Lam Research, Applied Materials, and other wafer manufacturers will catch up, and once the pendulum swings back, we will be able to manufacture enough chips Then we will optimize every parameter, optimizing 10-15% of energy costs would be meaningful. As we may shift to ASICs, if NVIDIA's profit margin is not +70%, perhaps energy costs could become 30% of cluster costs. These are all worth optimizing.

But Elon doesn't win by making 20% improvements. He never wins that way. Elon only wins when he goes all out and makes 10x improvements. That is the mission of SpaceX. That is the mission of Tesla. All his successes are related to this, rather than chasing that 20% improvement. I believe that as Earth's resources become scarcer, space data centers will eventually become a 10x improvement, but that is not something for this decade.

Dewaksh Patel

To give everyone a sense of how much land there is on Earth... Obviously, for the chips themselves, especially if you enter a world with organic racks (each rack having megawatt-level power consumption)—

Dylan Patel

That's another matter. If manufacturing is the limiting factor, AI chips are currently about 1 watt per square millimeter. A simple way to improve this is to raise it to 2 watts per square millimeter. You might not get double the performance, perhaps only a 20% performance boost, and that would require more exotic cooling methods. It needs more complex cold plates, intricate liquid cooling, or even possibly something like immersion cooling.

In space, achieving higher watts per millimeter is difficult, while on Earth, these are already solved problems. One of these things could allow you to gain more tokens, perhaps producing 20% more tokens per wafer manufactured, which is a huge win.

Dewaksh Patel

Square millimeters, are you referring to the area of the die?

Dylan Patel

Yes, the area of the die.

Dewaksh Patel

It would be more advantageous in space because higher watts per millimeter means the chip runs hotter. I think this is a computer chip engineering issue, but according to the Stefan-Boltzmann law, cooling is proportional to the fourth power of temperature. If you can make the chip run very hot, that allows for a lot—

Dylan Patel

No, you can't make it hotter. You can only increase the power per unit area. The problem is that extracting heat from that dense area means you have to shift from standard air and liquid cooling to more exotic forms of liquid cooling, or even immersion cooling, to achieve higher power density. This is harder in space than on Earth.

Dewaksh Patel

Perhaps at this point, it's worth explaining what scale-up really is, and what it looks like for NVIDIA, Trainium, and TPU.

Dylan Patel

Earlier I mentioned that communication within the chip is super fast. Communication between chips within the same rack is also fast, but not as fast. It's at the TB level. For distant places, like across countries, the communication is at the GB level The scale-up domain is a tightly-knit domain where chips communicate at a rate of TB per second. For NVIDIA, this previously meant an H100 server with 8 GPUs, which could communicate with each other at TB per second. With the Blackwell NVL72, they achieved rack-level scale-up. This means all 72 GPUs in the rack can connect at TB per second. The speed doubles with each generation, but the most significant innovation is increasing the number of GPUs in the domain from 8 to 72.

When we look at Google, their scale-up domain is entirely different. It has always been on the order of thousands. For TPU v4, they have a pod of 4000 chips. For v8 or v7, their pods are in the range of 8000 or 9000. It is worth noting that this is different from NVIDIA. They are not comparable.

Google has a toroidal topology. Each chip connects to six neighbors. NVIDIA's 72 GPUs are all-to-all interconnected. They can send data to any other chip in that scale-up pod at TB per second. In contrast, Google requires routing through other chips. If TPU 1 needs to communicate with TPU 76, it must route through various chips, and when you do this, there will always be some resource contention because that TPU is only connected to six other TPUs.

So there are differences in topology and bandwidth, with each having its pros and cons. Google can achieve a large-scale scale-up domain, but at the cost of needing to route through other chips to get from one chip to another. You can only communicate with six direct neighbors.

Amazon has changed their scale-up domain. They are somewhere between NVIDIA and Google. They are trying to create a larger scale-up domain. They are attempting to implement an all-to-all interconnect like NVIDIA to some extent, using switches, but they also use a toroidal topology similar to Google's to some degree.

As we move towards the next generation, all three companies are increasingly turning to dragonfly topology. This means that some components are fully connected while others are not. You can scale up to hundreds or thousands of chips without competing for resources during routing.

Devaksh Patel

Related question: I have heard someone claim that the reason for the slow scaling of parameter size—until now we have only seen larger models from OpenAI and Anthropic—is that... the original GPT-4 had over a trillion parameters, and only now are models starting to approach that scale again. I heard a theory suggesting that the reason is that NVIDIA's scale-up domain has never had that much memory capacity. Suppose you have a 5T parameter model running in FP8, that would be five trillion GB Then there is the KV cache, assuming the same size.

Dylan Patel

Let's call it the same size.

Devaksh Patel

Okay, let's assume it's the same size for a batch of requests. So you need 10 TB to run one forward pass.

Dylan Patel

One forward pass, yes.

Devaksh Patel

Only at GB200 and NVL72 does NVIDIA's scale-up domain have 20 TB, and before that, they are much smaller. On the other hand, Google has always had these massive TPU pods, which, although not fully interconnected, have hundreds of TB of capacity within a single scale-up domain. Does this explain why the parameter scale-up is slow?

Dylan Patel

I think part of the reason is capacity and bandwidth, but also because building larger models slows down deployment. In terms of end-user inference speed, this is somewhat irrelevant. The real key is reinforcement learning (RL).

What we see in these models and the allocation of computing power in the lab is... there are mainly a few ways to allocate computing power. You can allocate it to inference, i.e., revenue. You can allocate it to development, i.e., building the next model. You can allocate it to research. In development, it is specifically divided into pre-training and reinforcement learning.

When you think about what is happening, the computational efficiency gains from research are so significant that you actually want most of the computing power to be used for research rather than development. All these researchers are generating new ideas, trying them out, testing them, and continuously pushing the Pareto optimal curve of scaling laws forward. Empirically, we see model costs decreasing by a factor of 10 each year, or even more. Costs decrease by a factor of 10 at the same scale, while reaching new frontiers costs the same or more. So you wouldn't want to allocate too many resources to pre-training and reinforcement learning. You actually want to allocate most of the resources to research.

In the middle is the development phase. If you pre-train a 50 trillion parameter model, how many rollouts are needed in reinforcement learning? A rollout for a 50 trillion parameter model is five times larger than that of a 10 trillion parameter model. If you want to do the same number of rollouts—perhaps a larger model has twice the sample efficiency—now you need 2.5 times the time for reinforcement learning to make the model smarter.

Alternatively, you can do reinforcement learning for twice the time on a smaller model. On a large model, if it has twice the sample efficiency and does X rollouts, there will still be a 25% difference. But a smaller model (10 trillion parameters), although it has lower sample efficiency, does twice the rollouts and still completes faster. You get the model earlier, you do more reinforcement learning, and then you can use that model to help build the next model, assist your engineers in training, and conduct all these research ideas

This feedback loop tends to favor smaller models in any case, regardless of your hardware. When you look at Google, they have indeed deployed the largest production model among all major labs, Gemini Pro. It is larger than GPT-5.4. It is larger than Opus. Google does this because they have a monolithic computing cluster. It is almost entirely TPU.

Anthropic, on the other hand, has to deal with H100, H200, Blackwell, Trainium, and various generations of TPU. OpenAI currently mainly uses Nvidia but is also starting to use AMD and Trainium. A computing cluster like Google's can be optimized around a larger model. They can leverage a thousand chips in a scale-up domain to significantly accelerate reinforcement learning speed, making this feedback loop faster.

But ultimately, when viewed in isolation, you will almost always choose a smaller model, which can perform reinforcement learning faster and be deployed earlier in research and development. You can build the next thing and gain more efficiency improvements. You have a compounding effect that allows a smaller model to be deployed earlier in research and development. I spend less computing power on training because I can allocate more computing power to research. This compounding effect of being able to conduct research faster may lead to a quicker takeoff. This is what all these companies want: the fastest possible takeoff speed.

02:14:07 – Why aren't more hedge funds involved in AGI investments?

Devaksh Patel

Okay, a sharp question. You have explained the sales of these spreadsheets by SemiAnalysis. You always point out that six months or a year ago, you warned people about the memory crisis. Now you are telling people about the cleanroom crisis, and there will be a tool crisis in the future. Why is Leopold the only one who has made a fortune using your spreadsheets? What are others doing?

Dylan Patel

I think many people are making money in many ways. Leopold jokingly says he is the only client who told me our numbers were too low. Everyone else told me our numbers were too high, almost nauseatingly so. Whether it’s some hyperscale cloud provider saying, "Hey, that other hyperscale cloud provider, their numbers are too high," we respond, "No, that's just how it is." They say, "No, no, no, that can't be," and so on. When we work with hyperscale cloud providers or AI labs, you ultimately have to convince them with all these facts and data that, in fact, no, that number is not high; it is correct. Ultimately, sometimes it takes them six months to realize, or a year later.

Other clients also use our data for trading. About 60% of the business comes from the industry. So it's AI labs, data center companies, hyperscale cloud providers, semiconductor companies, the entire AI infrastructure supply chain. But 40% of our revenue comes from hedge funds. I won't comment on who our clients are, but many people use this data The question is how you interpret it and how you view the future beyond it.

I would say that Leopold is almost the only one who always tells me my numbers are too low. Sometimes he’s too high, sometimes I’m too low. But overall, I think others are doing the same. You can look at the entire field of hedge funds, see their 13F forms, see what they hold, and maybe not exactly what Leopold holds, because you’re always asking what is the most constrained thing. What is the most out of expectation?

That’s what you really want to capitalize on: inefficiencies in the market. In a sense, our data makes the foundational data of what’s happening more accurate, making the market more efficient. Many funds do trade based on existing information... I don’t think Leopold is the only one. I do think he has the most faith in AGI taking off, though.

Devaksh Patel

Right, but these bets aren’t about what will happen in 2035. The bets you’re making—at least as reflected in the public returns of the different funds we can see (including Leopold’s fund)—are about what has happened in the past year. What has happened in the past year can be predicted with your spreadsheets. The key is to buy the spreadsheet for the next year.

Dylan Patel

They’re not just spreadsheets. There are reports. There’s API access to data. There’s a ton of data.

Devaksh Patel

But do you understand what I mean? It’s not about some crazy singularity event. It’s about, do you believe in the memory crisis?

Dylan Patel

You only believe in the memory crisis if you believe AI will take off on a large scale. The memory crisis, for a large part, is based on... at least for those thinking about infrastructure in the Bay Area, it’s obvious. As the context length increases, the KV cache explodes, so you need more memory. Then you perform the calculations.

You also have to have a lot of supply chain understanding about which fabs are being built, which data centers are being built, how many chips, and so on. We track all these different datasets very closely, but ultimately, someone needs to fully believe this will happen.

A year ago, if you told someone that memory prices would quadruple and smartphone sales would drop 40% in the next year or two, people would say, "You’re crazy. That will never happen." Except some people did believe that, and those people did trade memory.

And indeed, some people did. I don’t think Leopold is the only one buying memory companies. He certainly does it better than some, perhaps most, in terms of scale, position, and how he operates. I don’t want to comment on whose returns are how, but he has indeed done very well. Others have also done very well.

Wow, you’ve made me diplomatic for the first time ever. No, you’re fine. I think it’s funny. I’m acting like a diplomat when I’m usually quite spicy.

02:18:30 – Will TSMC squeeze Apple off the N2 process?

Devaksh Patel

Alright, let's do a few quick questions and answers. If you say that in terms of memory, logic, etc., N3 will mainly be used for AI accelerators, but there's also N2, which is primarily used by Apple... In the future, I think AI will also want to use N2. If NVIDIA, Amazon, and Google say, "Hey, we're willing to pay a lot for N2 capacity," can TSMC kick Apple out?

Dylan Patel

I think the challenge here is that the timeline for chip design is long, so that's something that will happen more than a year from now, and using 2-nanometer designs is even further out.

What will really happen is that NVIDIA and all the other companies will say, "Hey, we want to prepay for capacity, and you need to expand capacity for us." Maybe TSMC can make a little profit, but not much. They won't completely kick Apple out. What they will do is when Apple orders X, they might say, "Hey, we expect you only need X minus one, so we will give you X minus one." Then that portion of flexible capacity, Apple will be a bit shortchanged.

Traditionally, Apple always overbooks by 10% and reduces by 10% throughout the year. In some years, they fully utilize that 10%. Sales fluctuate based on seasonality and macroeconomics.

I don't think TSMC will kick Apple out. I think Apple will become an increasingly smaller part of TSMC's revenue, thus reducing the relevance of TSMC meeting their demands. TSMC may eventually start saying, "Hey, you need to pre-order capacity for next year and the year after, and you need to prepay capital expenditures," because that's what NVIDIA, Amazon, and Google are doing.

Devaksh Patel

I'm curious if it's worth delving into specific numbers. I don't have them on hand. What is Apple's share of N2 compared to AI in the coming years?

Dylan Patel

This year, Apple has most of the N2 capacity that will be produced. AMD has a little bit. They are trying to manufacture some AI chips and CPU chips early. A little bit, but most of it is Apple.

As we move into the following year, as others start mass production, Apple will still hold close to half of the share, but then it will drop sharply, just like N3, where they once held half. When I say N2, this includes the A16, which is a variant of N2. Over time, these nodes will become mainstream.

Interestingly, traditionally Apple is the first to enter new process nodes. 2nm is actually the first time they are not the first. Well, except for Huawei. Huawei was one of the first along with Apple back in 2020 and before, but they were both making smartphones. Now, with 2nm, AMD is trying to manufacture CPU and GPU chipsets within the same timeframe, and they are using advanced packaging to package them together. This is a big risk for AMD and could lead to delays because this is a brand new process technology and difficult. But ultimately, it's a gamble; they want to scale faster than NVIDIA and try to beat them

As we move forward, when we turn to the A16 node, the first customer there is not even Apple. It's AI. As we progress, this will become increasingly common. Apple will not only not be the first to enter the new node, but it will also not be the main user of the new node. They will become just like any old customer.

Because TSMC's capital expenditure is continuously expanding, while Apple's business growth is not as fast, they are becoming an increasingly unimportant customer. They will also cut orders due to various factors in the supply chain, whether it's packaging, materials, DRAM, or NAND. The costs of these things are increasing. They may not be able to pass all the costs onto customers because consumers are not that strong. You will ultimately find yourself in this dilemma, and they are no longer TSMC's best friend as they historically were.

Devakish Patel

Do you think if Huawei could use 3nm, they would create an accelerator better than Rubin?

Dylan Patel

It's possible, yes. Huawei was also the first company to have a 7nm AI chip. They were the first to have a 5nm mobile chip, but they were also the first to have a 7nm AI chip. Huawei's Ascend was two months ahead of TPU and four months ahead of NVIDIA's A100, I think.

That's just moving to a new process node. It doesn't mean software or hardware design, or all the other aspects. But Huawei can be said to be the only company in the world that has all the legs. Huawei has cracked software engineering. Huawei has cracked networking technology. In fact, that is their biggest business historically. They have cracked AI talent.

Moreover, apart from NVIDIA, they actually have better AI researchers. Apart from NVIDIA, they have their own foundry. Apart from NVIDIA, they have end markets like selling tokens. Huawei is able to attract top talent. NVIDIA can too, but not as concentrated, and Huawei has a larger talent pool in China.

It is very controversial that if Huawei could use TSMC, they would be better than NVIDIA. In certain areas, China has advantages that are not easily accessible to NVIDIA. It's not just about scale; certain optical technologies are indeed strengths of China.

I think it's reasonable to say that if Huawei had not been banned from using TSMC in 2019, they might have surpassed Apple to become TSMC's largest customer. Huawei has a significant share in networking, computing, CPUs, and all these areas. They would continuously gain market share, and they are likely to be TSMC's largest customer.

02:24:16 – Robots

Devakish Patel

Wow. That's crazy. I have a random last question for you. Another part of the Elon interview is about robots. If the proliferation of humanoid robots exceeds people's expectations, and by 2030 there are millions of humanoid robots running around, each requiring local computing power, what are your thoughts on what that means? What would that require?

Dylan Patel

There are many challenges with deploying VLM and VLA on robots. But to some extent, you don't need to put all the intelligence on the robot. A more efficient approach is not to do that. Because in the cloud, you can do batch processing and so on.

What you might want to do is have a more capable model running in the cloud with a very high batch size, handling most of the planning and longer-term tasks. Then it pushes those instructions to the robot, which interpolates between each subsequent action. Or it is given a command like, "Hey, pick up that cup," and then the model on the robot can pick up the cup. When it picks up the cup, factors like weight and force may need to be determined by the model on the robot, but not everything needs to be. It can say, "Hey, that's a headset," and the super model in the cloud can say, "I know those headphones are Sony XM6s," this is not a Devaksh advertisement, but...

Devaksh Patel

I was thinking at the time, why is this guy pushing this thing so hard? It's right there on the table. When we interviewed Satya Nadella, it was around his neck. Did Sony pay him?

Dylan Patel

Unfortunately, no. But anyway, it might say, "Hey, the headband is soft, the weight is this," and so on. Then the model on the robot can be less intelligent, accept those inputs, and perform actions. It might receive instructions from the cloud model once per second, or ten times per second, depending on the frequency of actions. But a lot can be offloaded to the cloud.

Otherwise, if you do all the processing on the device, I believe it would be more expensive because you can't batch process. Second, you can't have as high intelligence as in the cloud because the models in the cloud will just be larger. Third, we are in a world of semiconductor shortages, and any robot you deploy needs leading-edge chips because the power supply for robots is really poor. You need it to be low power and efficient, and then suddenly, you are putting the power and chips that were applied to AI data centers into the robot. So if you deploy millions of humanoid robots, that 200 gigawatts will become less.

Devaksh Patel

I find this very interesting because people may not realize one characteristic of the future, which is how physically concentrated intelligence will be. Right now, there are 8 billion humans, and their computation is in their minds, on them.

In the future, even with robots operating in the physical world—obviously, knowledge work will be done in a centralized way in data centers, with tens of thousands or even millions of instances—you are implying a future where more centralized thinking and computation drive millions of robots around the world. This is an interesting fact about the future that people may not realize.

Dylan Patel

I think Elon recognizes this, which is why he is looking for different locations for his chips everywhere. He signed this huge deal with Samsung to manufacture his robot chips in Texas

Besides the newly launched LPU from NVIDIA, no one is really making AI chips on Samsung. They are set to release it next week, but we recorded this a week prior.

Devaksh Patel

This episode will air on Friday.

Dylan Patel

Oh, this episode will air before that. Awesome. They are releasing this new AI chip next week, which is manufactured on Samsung, but this is NVIDIA's recent development. That's the only other AI demand there, and everything is competitive on TSMC. He has geopolitical diversification and supply chain diversity for his robots, and he won't face the unlimited willingness to pay like those "geniuses" in the data centers.

Devaksh Patel

Alright, Dylan. That's great. Thank you so much for coming on the podcast.

Dylan Patel

Thank you for having me. See you tonight