- Chain of Thought
- Posts
- Power Play: Data Centres & Semicon
Power Play: Data Centres & Semicon
Power is the primary constraint for the training of frontier models. And why China has a massive AI advantage
Happy Sunday!
Every now and then, I stumble on a podcast that feels like a 100x return on time spent because it interviews guests with specific domain knowledge that is difficult to find elsewhere. When I do so, I feel compelled to share it with you.
I just finished listening to the Dwarkesh Podcast, where he interviews two experts in the semiconductor space—Dylan Patel from Semianalysis (an AI hardware research firm) and Jon Y from Asianometry (popular YouTube creator on semiconductors).
This one isn’t specifically about Crypto AI, but it’s a fascinating listen if you’re interested in AI. It’ll shape your views of how the future will play out. It’s worth the full 2-hours.
Now, I’m no semiconductor expert, but this episode gave me a whole new appreciation for how critical this space is to AI’s future—especially when it comes to building data centres and scaling AI infrastructure.
If you’re strapped for time, I took the liberty to summarize my key takeaways for you in a 5-minute read.
Power Demand and Data Centers
Training large AI foundational models requires massive computing resources, which is driving unprecedented power demand.
Power is often the primary constraint on scaling compute clusters. And it's not about the costs—rather, it's about the ability to generate new power to supply the data centres and governmental regulations around the use of power.
Data centres are being pushed to higher power limits, and their energy efficiency becomes critical in determining how many more FLOPS (floating-point operations per second) can be delivered. A 10-gigawatt data centre will not be far-fetched in the future.
One critical challenge in data centres is data movement, which consumes far more power than the computing operations themselves. Minimizing data movement and maximizing compute efficiency can lead to significant energy savings.
Architectural advancements, even if process nodes remain the same, have the potential to improve energy efficiency 100x. These architectural changes could allow data centres to deliver much more computing power without increasing energy consumption proportionally.
China’s Massive Advantage in AI
China’s capacity to rapidly scale its power infrastructure is a major advantage.
China can build large-scale power plants and substations quickly, allowing them to support immense data centres easily. For instance, China could potentially build a gigawatt-scale data centre around key areas like the Three Gorges Dam in six months.
By 2027, China could be operating data centres on the order of 10 gigawatts, far beyond what the U.S. is currently planning or capable of. This gives China a major advantage in training massive AI models on a centralized infrastructure.
In contrast, the U.S. faces significant challenges in building large-scale data centers due to power generation and distribution infrastructure limitations. The lack of substantial additions to the U.S. power grid means there is a bottleneck in the ability to support the next wave of AI compute scaling.
Insights on Semiconductor Manufacturing
Taiwan Semiconductor Manufacturing Company (TSMC)’s recipe for semiconductor innovation is highly secretive and involves multivariable problems requiring intense experimentation. TSMC is the largest chip foundry in the world.
Refining semiconductor manufacturing relies heavily on tacit knowledge, often passed down in a secretive apprentice-master style. This has allowed Taiwan to continue dominating the semiconductor industry.
The semiconductor industry is highly fragmented, with various companies dominating different parts of the value chain. This fragmentation means vertical integration has been largely abandoned in favour of specialization. Each component, from chemicals to specific types of chips, has only a few competitors.
Corporate espionage within the semiconductor industry is rampant. Chinese hackers have infiltrated key players like ASML, which create lithography systems essential in chip manufacturing. China’s semiconductor industry has relied on poaching Taiwanese talent from TSMC.
Rising Costs of Advanced Nodes
Progressing to smaller semiconductor process nodes (e.g., 7nm to 5nm to 3nm, nm = nanometers) has become increasingly expensive and complex. Due to the sheer costs involved, the economic justification for smaller nodes like 2nm is questionable.
Historically, Moore’s Law allowed consistent scaling, but the benefits of moving to the next smaller node (in terms of power efficiency and performance) are diminishing while the costs have skyrocketed. For example, moving from 5nm to 3nm only yields about a 30% improvement in logic scaling and approximately 20% improvement in power per transistor, which is much smaller than the gains seen in earlier transitions.
The next node improvement (e.g., N2/2nm) requires massive aggregated demand across many companies to fund the development and manufacturing process. The scaling costs are no longer justifiable for most players unless they have immense concentrated spending power (e.g., Apple with iPhones or AI driving massive demand for leading-edge chips)
The rise of AI is one of the few economic forces actually making advanced nodes like 2nm viable. AI’s insatiable demand for more efficient compute and higher-density chips is driving investment into leading-edge semiconductor manufacturing.
Hope you found it as insightful as I did.
BTW: If you'd like to see more summaries of important podcasts/conversations, let me know by replying to this email
Cheers,
Teng Yan
Reply