NVIDIA: GTC 2025 & Investment Memo

Keynote insights. And could this be our 2nd shot at the trillion-dollar AI wave?

What a week.

Trump’s new tariff threats hit like a sledgehammer, sending global risk assets into a spin cycle. The NASDAQ is down 20% this year. That last time this happened was during COVID-19 (2020) and the Great Financial Crisis (2007).

Markets are panicing, and the global “soft landing” narrative suddenly feels like a fever dream. Stagflation memes are back. Risk is off. Vibes? Shaky.

But through it all, one trend refuses to flinch:

AI is still going vertical.

Every week brings more signal: new models, new agents, new use cases. And honestly, from where I’m sitting, it feels like the pace is only speeding up.

I’ve been thinking a lot about how to go deeper into this wave. Crypto AI is one slice of the pie, sure. But if you zoom out, the bigger opportunity becomes obvious.

To play this wave right, you need to be thinking about the picks and shovels. The infrastructure stack. The supply chain. The base layer that’s powering all of AI today.

With Trump tossing tariff grenades and markets getting jittery, this is the reset we didn’t know we needed. A rare second shot to position ahead of what feels like the next AI supercycle.

Moments like this don’t come often. It feels like a great time to accumulate positions in companies with great fundamentals and sit tight for the long game.

Here’s my thesis, plain and simple:

  1. AI is the most important secular trend of the next 30 years. We are barely at the starting line. It’s been only three years since ChatGPT went live

  2. Robotics—especially humanoid robotics and physical AI—will be the next new trillion-dollar market in the next 5 years.

I’ve written before about missing NVIDIA the first time around. Painfully. Easily one of my biggest investment regrets. Watching it run while I sat on the sidelines? Yeah, that one stung.

Fast forward to now: NVIDIA’s down over 30% in just three months..

So in line with my thesis, I’m doing what any semi-rational (read: mildly obsessed) investor would do—going back to the source. Rewinding the tape, rewatching the footage. And there’s no better place to start than NVIDIA’s GTC from two weeks ago.

Jensen Huang’s two-hour keynote was honestly better than anything on Netflix. Equal parts entertaining, educational, and low-key mind-blowing. When the little blue humanoid robot walked out at the end, I legit got goosebumps. This isn’t sci-fi anymore. It’s happening. Robots are here.

I pulled together a full breakdown of the keynote alongside some quick commentary you can digest quickly. Summarized with help from Gemini, my go-to model for long-context recaps.

I went a step further. I asked OpenAI’s Deep Research to create a detailed investment memo on NVDA. Is it still a buy? Is the generational compounding story intact?

It spat out a full memo in under 15 minutes. Honestly, I thought it was better than anything I could come up with myself. With the right prompts, AI is already a cheaper faster better researcher than 90% of MBA grads.

Here’s the preview:

Crafted by OpenAI’s Deep Research

You can read it the full, detailed NVDA investment memo here (7,000+ words). It’s pretty good.

Also before we go deeper, I wanted to share that we do punchy, high-signal recaps of the best Crypto and AI podcasts on a daily basis! You can catch them all on our new podcast site:

Here we go.

Key Takeaways from NVIDIA’s GTC 2025 Keynote

"…We are now seeing the inflection point happening in the world's data center build outs... the computer has become a generator of tokens, not a retrieval of files... I call them AI factories."

"…Blackwell MVLink 72 with Dynamo is 40 times the performance AI Factory performance of Hopper."

NVIDIA CEO Jensen Huang’s message was clear: AI is compounding fast. And with that comes a relentless demand for exponentially more compute.

I’ve also included some of my quick commentary/opinions marked with 💡

GeForce, Cuda, and AI's Impact on Graphics

  • Huang traces Nvidia's AI journey back to GeForce and the introduction of Cuda (Compute Unified Device Architecture), a parallel computing platform and programming model created by Nvidia. Cuda enabled developers to use Nvidia GPUs for general-purpose processing, which became foundational for the AI revolution.

  • He showcases the new GeForce 5090 (Blackwell generation), noting its 30% smaller volume and improved energy dissipation compared to the 4090, attributing performance leaps to AI.

  • Huang explains how AI now revolutionizes graphics through techniques like real-time path tracing, where AI predicts the vast majority of pixels. "For every pixel that we mathematically rendered, artificial intelligence inferred the other 15," Huang states, highlighting the precision and temporal stability required.

The Evolution of AI: From Perception to Physical AI

  • Huang outlines the rapid evolution of AI over the past decade, starting with perception AI (computer vision, speech recognition) and moving into generative AI over the last five years. Generative AI refers to AI models capable of creating new content (text, images, code, etc.) based on the data they were trained on.

  • He introduces the concept of agentic AI, the current breakthrough, describing it as AI with agency – the ability to perceive context, reason, plan, take action, and use tools (like browsing websites). Reasoning is identified as the core new capability.

  • The next wave, Huang predicts, is physical AI, which understands the physical world (friction, inertia, cause-and-effect, object permanence). This understanding is crucial for enabling the next era of robotics. Each AI wave opens new markets and brings more partners into the ecosystem.

Core Challenges in Scaling AI

  • Huang identifies three fundamental challenges enabling each AI wave:

    • Solving the data problem: AI needs vast amounts of digital experience to learn.

    • Solving the training problem without human-in-the-loop limitations: Enabling AI to learn at superhuman rates and scale.

    • Solving the scaling problem: Finding algorithms where more resources lead to smarter AI (the scaling law).

  • He asserts the world underestimated the computational scaling required, especially with agentic AI and reasoning, suggesting compute needs are "easily a hundred times more" than anticipated a year ago.

Reasoning AI's Exponential Compute Demand

  • Huang explains why reasoning demands so much more compute. Unlike simple retrieval or one-shot generation, reasoning involves step-by-step problem-solving (Chain of Thought), exploring multiple approaches, consistency checking, and self-verification.

  • This process generates vastly more tokens – the fundamental units of data processed and generated by AI models (like parts of words or pixels). Instead of generating one token after another for a final answer, reasoning generates sequences of tokens representing intermediate steps, dramatically increasing the total token count.

  • This translates to needing either 100x more tokens generated or generating 10x more tokens and needing 10x faster computation for responsiveness, resulting in a ~100x increase in compute demand for inference.

Training Reasoning AI: Reinforcement Learning and Synthetic Data

  • To teach AI reasoning without relying solely on limited human demonstration, Huang highlights the breakthrough of reinforcement learning with verifiable results. This involves training AI on problem spaces with known answers (math, logic, puzzles like Sudoku), allowing the AI to attempt solutions millions of times and rewarding progress.

  • This method generates trillions of tokens of synthetic data (artificially generated data used for training), overcoming the bottleneck of real-world data scarcity. This combination puts enormous demands on compute infrastructure for training.

💡 Heavy use of reinforcement learning and synthetic data means one thing: an insatiable, growing demand for serious training infrastructure.

Infrastructure Growth and AI Factories

  • Huang shows data illustrating the dramatic increase in Hopper GPU shipments to the top four Cloud Service Providers (CSPs: Amazon AWS, Microsoft Azure, Google Cloud Platform, Oracle Cloud Infrastructure), indicating the market's response to AI's compute demands. He contrasts the peak Hopper year with the start of Blackwell shipments, signaling an inflection point.

  • He forecasts continued massive growth in data center capital expenditure, potentially reaching a trillion dollars annually soon. This growth is driven by the platform shift from general-purpose computing to accelerated computing (GPUs/AI accelerators) and the recognition that future software requires capital investment in compute infrastructure.

  • Huang introduces the term "AI Factories": data centers specifically designed for generating AI tokens, analogous to traditional factories producing physical goods. This reframes data center investment from IT cost centers to revenue-generating production facilities.

💡 The "AI Factory" concept fundamentally changes the economics of data centers. Investments in the infrastructure (hardware, cooling, networking, software) enabling these factories are central to the AI value chain.

Accelerated Computing Beyond AI: The Cuda X Ecosystem

  • Huang emphasizes that accelerated computing extends beyond AI, showcasing a wide array of Cuda X libraries – specialized software libraries built on Cuda to accelerate specific domains.

  • Examples include:

    • cuNumeric: Accelerating NumPy (a fundamental Python library for numerical computing).

    • cuLitho: Accelerating computational lithography (used in semiconductor manufacturing), partnering with TSMC, Samsung, ASML. Huang predicts all lithography will use Cuda within 5 years.

    • Ariel: Turning GPUs into 5G radios, enabling "AI RAN" (Radio Access Networks).

    • cuOpt: Accelerating optimization problems (logistics, scheduling, supply chain). Nvidia is open-sourcing this.

    • Parabricks: For gene sequencing.

    • MONAI: For medical imaging.

    • Earth-2: For high-resolution weather prediction.

    • cuQuantum: For quantum computing research and simulation.

    • cuDSS: New sparse solvers crucial for CAE (Computer-Aided Engineering) and EDA (Electronic Design Automation), accelerating chip design itself.

    • cuDF: Accelerating data frames (like Pandas and Spark).

    • Warp: A Python library for physics simulation on Cuda.

  • He stresses that the value comes from the combination of these libraries and the vast, ubiquitous Cuda install base, allowing developers to reach a massive market.

AI Beyond the Cloud: Enterprise, Edge, and Autonomous Vehicles

  • While AI started in the cloud, Huang states it will proliferate everywhere, requiring different configurations and domain-specific adaptations for enterprise IT, manufacturing, robotics, and edge computing.

  • He highlights the emergence of specialized GPU Clouds (like CoreWeave) focusing solely on hosting GPUs.

  • Edge AI: A major announcement involves a partnership with Cisco, T-Mobile, CuspAI, and ODC to build a full-stack AI platform for radio networks in the US, aiming to infuse AI into the $100B annual investment in telecom infrastructure for optimized signal processing and communication. Huang envisions AI revolutionizing communications through context and prior knowledge, similar to human interaction.

  • Autonomous Vehicles (AVs): Huang recounts Nvidia's decade-long investment in AVs, driven by the potential seen in early computer vision AI (AlexNet). Nvidia provides technology for data centers (training/simulation) and in-car compute, working with companies like Tesla, Waymo, and Wave.

  • A significant partnership with General Motors (GM) is announced, where Nvidia will collaborate on GM's future self-driving fleet, covering AI for manufacturing, enterprise operations, and in-vehicle systems.

  • Automotive Safety (Halos): Huang emphasizes Nvidia's deep investment in safety, with rigorous assessment (7M lines of code reviewed), adherence to standards (ISO 26262), and technologies ensuring diversity, monitoring, transparency, and explainability.

  • AV Development Tech: A video showcases how Nvidia uses Omniverse (a platform for building and operating metaverse applications and digital twins) and Cosmos (a generative AI model for creating virtual worlds) for AV development, including model distillation, closed-loop training in simulation, and synthetic data generation to improve robustness and handle diverse environments.

Blackwell and the Rise of AI Factories

  • Huang unveils the Blackwell platform, calling it a "fundamental transition in computer architecture." He contrasts the previous Hopper generation (HGX system with 8 GPUs connected via NVLink within the board, then PCIe to CPUs, and InfiniBand for scale-out) with the new Blackwell architecture.

  • Blackwell Architecture:

    • Features disaggregated NVLink Switches. NVLink is Nvidia's high-speed GPU interconnect technology. Disaggregating the switches allows for much denser compute configurations.

    • Compute nodes are densely packed and fully liquid-cooled.

    • A single rack contains 72 Blackwell GPUs (each Blackwell "chip" actually contains two dies), connected via the external NVLink switches (NVLink 72 configuration), 600,000 components, 2 miles of cables, weighs 3,000 lbs, and consumes 120 kW.

    • This single rack delivers one ExaFLOPS of AI compute (a quintillion floating-point operations per second). Huang describes this as the "ultimate scale-up."

💡 The shift to liquid cooling and disaggregated high-speed interconnects (NVLink) represents a major architectural change in data center design, impacting facility requirements, power density, and supply chains.

The Inference Challenge: Optimizing AI Factories

  • Huang argues that inference (running trained AI models to generate outputs/predictions/tokens) is the "ultimate extreme computing problem" because it directly impacts revenue and profitability for AI services. Efficiency and performance are paramount.

  • He presents a crucial chart plotting Tokens per Second per User against Total Tokens per Second per Factory. The goal is to push the curve outwards (maximize area under the curve), balancing fast responses for smart AI with high overall token generation for serving many users.

  • Achieving this requires immense computational power (FLOPS) and massive memory bandwidth.

Reasoning Models and Inference Complexity

  • A demo contrasts a traditional Large Language Model (LLM) with a reasoning model for a complex task (wedding table seating). The traditional LLM answers quickly (<500 tokens) but incorrectly. The reasoning model uses significantly more tokens (>8,000) to think through constraints and arrive at the correct answer.

  • This highlights the compute cost of reasoning. Huang explains the technical complexity of running large models (trillions of parameters) across many GPUs, requiring sophisticated parallelization techniques (tensor parallel, pipeline parallel, expert parallel).

  • He details the two phases of inference:

    • Pre-fill: Processing the initial prompt and context (flops-intensive).

    • Decode: Generating output tokens one by one (bandwidth-intensive, requiring pulling the entire model or relevant parts repeatedly for each token).

💡 Pre-fill vs. decode have wildly different compute demands—and when you layer in parallelization strategies, it’s clear: AI needs ultra-flexible, fully programmable infrastructure to scale.

Nvidia Dynamo: The AI Factory Operating System

  • To manage this complexity, Nvidia announces Dynamo, described as the "operating system of an AI Factory." Dynamo dynamically allocates resources, manages different parallelism strategies, handles workload scheduling (like disaggregating pre-fill and decode workloads across different GPU groups), and manages the KV Cache (the contextual information the model keeps track of during generation).

  • Dynamo is open source, and Nvidia is partnering with companies like Perplexity AI on its development.

Blackwell Performance Leap

  • Using the inference factory chart (normalized to 1 MW of power), Huang compares Hopper and Blackwell:

    • Hopper (baseline)

    • Blackwell with NVLink 8 (faster chip)

    • Blackwell using FP4 (4-bit floating point) precision (a technique to reduce the data size of model weights and activations, saving memory, bandwidth, and power, often with minimal accuracy loss). This significantly boosts energy efficiency.

    • Blackwell scaled up with NVLink 72 (leveraging the full rack architecture).

    • Blackwell NVLink 72 with Dynamo optimizing workloads.

  • The result: At iso-power (same power consumption), Blackwell delivers 25x the performance of Hopper on the benchmark task, considering the area under the latency/throughput curve. For a specific reasoning model workload, Blackwell is 40x faster than Hopper.

  • Huang starkly states, "...when Blackwell starts shipping in volume, you couldn't give Hoppers away," emphasizing the generational leap and the economic imperative to adopt the new architecture for AI factories. He shows a 100 MW factory comparison: Hopper needs 1400 racks, while Blackwell achieves higher throughput with just 160 racks.

AI Factory Digital Twin

  • Building these complex AI factories requires meticulous planning. Nvidia showcases using Omniverse to create a digital twin of a planned 1 Gigawatt AI factory. This allows collaborative design and simulation before physical construction, integrating layout, power (Vertiv, Schneider Electric), cooling (Cadence Reality), and network topology (Nvidia Air).

  • This digital twin approach optimizes Total Cost of Ownership (TCO), Power Usage Effectiveness (PUE), reduces errors, and accelerates deployment.

Nvidia Hardware Roadmap: Annual Cadence

  • Huang unveils a clear, multi-year roadmap with an annual cadence:

    • Blackwell: Shipping now.

    • Blackwell Ultra: Second half of 2024 (1.5x FLOPS, 1.5x HBM, 2x Network Bandwidth).

    • Vera Rubin: Second half of 2025 (Named after astronomer Vera Rubin). New CPU (Rubin), new GPU (R100), CX9 networking, NVLink 6, HBM4. Features NVLink 144 (connecting 144 GPU dies). Clarification: Henceforth, NVLink numbers refer to connected GPU dies, not necessarily packaged chips.

    • Rubin Ultra: Second half of 2027. Features NVLink 576 for extreme scale-up (15 ExaFLOPS per rack, 4.6 PetaBytes/sec scale-up bandwidth, 600 kW per rack).

  • He visually compares the scale-up density, showing how Rubin packs significantly more power into the same physical footprint as Blackwell. The roadmap aims to dramatically reduce the cost per unit of computation (Watts / (FLOPS * Bandwidth)).

💡 NVIDIA dropping a predictable annual roadmap is a big deal—it gives hyperscalers (aka their biggest customers) the clarity they need to plan infra and capex cycles with confidence.

Scaling Out: Networking and Silicon Photonics

  • While NVLink handles scale-up within racks, InfiniBand and Spectrum-X Ethernet handle scale-out across the data center. Huang highlights the success of Spectrum-X in bringing InfiniBand-like performance (low latency, congestion control) to Ethernet, making high-performance AI networking more accessible.

  • A partnership with Cisco is emphasized, where Cisco integrates Spectrum-X to bring AI-ready networking to enterprises.

  • The next challenge is scaling out to hundreds of thousands or millions of GPUs. Traditional optical transceivers (pluggable modules converting electrical signals to optical) become a major power bottleneck (e.g., 180W and $6000 per GPU for a large cluster).

  • Nvidia's solution: Silicon Photonics with Co-Packaged Optics (CPO). They announce their first CPO system using Micro-Ring Resonators (MRM), a technology chosen for its density and power efficiency advantages over traditional Mach-Zehnder modulators used in telecom. This involves complex integration of photonic and electronic ICs directly on the switch package, eliminating pluggable transceivers for switch-to-switch links.

  • Benefit: This drastically reduces power consumption (saving tens of megawatts in a large data center, freeing power for actual compute) and enables much higher bandwidth density (512 ports per switch demonstrated). Silicon photonics switches are planned for InfiniBand (H2 2024) and Spectrum-X (H2 2025).

💡 Silicon photonics is a game-changer for data center networking—critical tech that’ll power the next generation of large-scale AI systems.

Enterprise AI: New Hardware and Software Stack

  • Huang addresses bringing AI to enterprises, noting the entire stack needs reinvention – processor, OS, applications, orchestration, and even storage (moving towards semantics-based retrieval). He envisions billions of "digital workers" (AI agents) augmenting the human workforce.

  • New Hardware:

    • DGX Spark: A compact, desktop system ("like the original DGX-1 with Pym Particles") featuring a MediaTek CPU connected via NVLink to a Blackwell GPU (128GB HBM), delivering 1 PetaFLOPS. Positioned as the development platform for millions of software engineers and data scientists. Available for reservation by GTC attendees.

    • DGX Station: A new personal workstation featuring Grace Blackwell (CPU+GPU), liquid-cooled, 72 CPU cores, 20 PetaFLOPS, with PCIe slots for additional GeForce cards.

  • Enterprise Software & Ecosystem:

    • Nvidia is working with the storage industry (Dell, HPE, NetApp, Pure Storage, VAST, etc.) to create GPU-accelerated, semantics-based storage systems.

    • A slide highlights Dell's comprehensive offerings of Nvidia AI infrastructure and software.

    • Nvidia NIMs (Nvidia Inference Microservices): Pre-built, optimized containers for deploying AI models. Huang announces an open-source reasoning model (like the R1 model shown earlier) available via NIMs, runnable anywhere from DGX Spark to the cloud.

    • Numerous enterprise partners (Accenture, AMD, AT&T, BlackRock, Cadence, Capital One, Deloitte, SAP, ServiceNow) are shown integrating Nvidia models, NIMs, and libraries into their AI frameworks and platforms.

Robotics and Physical AI

  • Huang declares the "time has come for robots," driven by labor shortages and the potential of physical AI. He outlines Nvidia's three-computer strategy for robotics: simulation/training computer, testing computer, and the robot's onboard computer.

  • Development Pipeline: A video details the workflow:

    • Using Omniverse and Cosmos to generate vast amounts of diverse, photorealistic synthetic data for training robot policies.

    • Using Isaac Lab (a physics simulation application for robotics) for reinforcement learning and imitation learning (teaching robots by demonstration or trial-and-error in simulation).

    • Using Omniverse for Sim-to-Real transfer and testing policies in digital twins with realistic physics before real-world deployment.

    • Using Isaac Mega and Omniverse blueprints for testing fleets of robots interacting in complex environments (e.g., a virtual Blackwell factory with Foxconn robots).

  • Isaac Groot N1: Nvidia introduces Groot N1, a generalist foundation model specifically for humanoid robots. It uses synthetic data and simulation learning, featuring a dual system architecture (fast/slow thinking) for perception, reasoning, planning, and action execution. Groot N1 is designed to be adaptable to different humanoid robot embodiments and tasks. It is announced as open source.

  • Newton Physics Engine: To address the need for highly accurate physics simulation for training fine motor skills and tactile feedback, Nvidia announces Newton, a new physics engine developed in partnership with DeepMind and Disney Research. It's designed for GPU acceleration, super real-time performance, and integration with robotics frameworks like MuJoCo. A demo shows a robot ("Blue") interacting realistically with objects simulated by Newton.

💡 NVIDIA is going full-stack on physical AI—hardware, sim tools like Omniverse and Isaac, foundation models like Groot, physics engines like Newton. It’s building the full platform to fast-track robotics development, aiming straight at what could be the next trillion-dollar industry.

Hope you found this valuable. For more Crypto and AI podcast summaries with similarly detailed show notes, check out our website. We keep things fresh there.

Stay safe and sharp!

Cheers,

Teng Yan

This article is intended solely for educational purposes and does not constitute financial advice. It is not an endorsement to buy or sell assets or make financial decisions. Always conduct your own research and exercise caution when making investments.

Reply

or to participate.