Introduction
In the competitive landscape of AI accelerators, Google has launched a silent yet powerful challenge to Nvidia's established dominance. The TPU v7 Ironwood chips, set to debut in the coming weeks, represent a qualitative leap that combines high-level performance with unprecedented scalability. While Nvidia CEO Jensen Huang tends to downplay the threat of specialized AI ASICs, the numbers tell a different story: Google is no longer competing solely on quantity, but also on performance quality.
Ironwood TPU v7 Accelerator Performance
The TPU v7 Ironwood marks a turning point in Google's hardware strategy. For the first time, the Mountain View giant's accelerators achieve performance comparable to Nvidia's latest generation GPUs when normalizing metrics to the same computational precision.
Each Ironwood chip delivers 4.6 petaFLOPS of dense FP8 compute power, slightly exceeding Nvidia's B200 GPU at 4.5 petaFLOPS and approaching the 5 petaFLOPS of the more powerful GB200 and GB300. This compute power is supported by 192 GB of HBM3e memory with 7.4 TB/s bandwidth, values that fall within the same range as Nvidia's B200 (192 GB HBM with 8 TB/s memory bandwidth).
For chip-to-chip communication, each TPU integrates four ICI Links providing 9.6 Tbps of aggregate bidirectional bandwidth, compared to 14.4 Tbps on the B200 and B300. Despite this gap in interconnect speed, TPU v7 compensates through a different architectural approach to network topology.
"Ironwood is Google's most capable TPU ever, delivering performance 10x that of TPU v5p and 4x that of TPU v6e Trillium."
Scalability: Google's True Ace in the Hole
While per-chip performance positions Ironwood at the same level as the latest Nvidia and AMD chips, the real difference emerges in the ability to scale these accelerators into enormous compute domains. While Nvidia has progressively increased the size of its compute units with NVL72 rack systems connecting 72 Blackwell accelerators via proprietary NVLink interconnect, Google operates on a completely different scale.
Ironwood TPUs are available in pod configurations starting from 256 chips up to 9,216 accelerators in a single compute domain. For contexts requiring even more power, Google offers the ability to scale further to multiple pods. Google's Jupiter datacenter network technology could theoretically support compute clusters up to 43 TPU v7 pods, equivalent to approximately 400,000 accelerators, although it's unclear how large TPU v7 clusters will actually be in production.
3D Torus Network Topology vs Nvidia Architectures
Google's approach to scaling compute fabrics differs substantially from Nvidia's. While the GPU manufacturer has opted for a large, relatively flat switch topology for its rack-scale platforms, Google employs a 3D torus topology, where each chip connects to others in a three-dimensional mesh.
This topology eliminates the need for high-performance packet switches, which are expensive, power-hungry, and can introduce unwanted latency under heavy load. However, the torus mesh implies that more hops may be required for chip-to-chip communication. As the torus grows, so does the potential for chip-to-chip latency.
By using switches, Nvidia and AMD ensure their GPUs are at most two hops away from the next chip. Which approach is better depends on the specific workload: some workloads may benefit from multi-hop topologies like the 2D and 3D toruses used in Google's TPU pods, while others may perform better on the smaller switched compute domains offered by Nvidia and AMD's rack designs.
Optical Circuit Switching: Innovative Network Technology
To manage the complexity of its TPU pods, Google employs a different switching technology that allows slicing and dicing TPU pods into various shapes and sizes to better suit internal and customer workloads. Instead of traditional packet switches, Google uses optical circuit switches (OCS).
OCS are more akin to 20th-century telephone switchboards. These appliances use various methods, including MEMS devices, to patch one TPU to another. Because this connection is usually made through a physical process connecting one port to another, it introduces little if any additional latency.
An additional benefit of OCS is its contribution to fault tolerance: if a TPU fails, OCS appliances can drop it from the mesh and replace it with a working part, ensuring operational continuity without significant interruptions.
Google's Established Experience with Large-Scale TPUs
Google has been using 2D and 3D toruses in conjunction with OCS appliances in its TPU pods since at least 2021, when TPU v4 debuted. The tech giant is no stranger to operating massive compute fabrics in production: TPU v4 supports pods up to 4,096 chips, while TPU v5p more than doubled that capacity to 8,960 chips.
The jump to 9,216 TPU pods with Ironwood shouldn't represent a significant obstacle for Google, given its consolidated experience. The availability of these massive compute domains has certainly caught the attention of major AI model builders, including those for whom Google's Gemini models are a direct competitor.
Anthropic and Google TPU Adoption
Anthropic is among Google's largest customers, having announced plans to utilize up to one million TPUs to train and serve next generations of its Claude models. Anthropic's embrace of Google's TPU technology isn't surprising, considering the model developer is also deploying its workloads across hundreds of thousands of Amazon's Trainium 2 accelerators under Project Rainier, which also utilize 2D and 3D torus mesh topologies in their compute fabrics.
The Growing Threat of AI ASICs to Nvidia
While Jensen Huang may downplay the threat of AI ASICs to his GPU empire, it's hard to ignore the fact that chips from companies like Google, Amazon, and others are rapidly catching up in terms of hardware capabilities and network scalability. In this competitive context, software often ends up being the deciding factor in accelerator choice.
Perhaps this is why analysts keep bringing up the question quarter after quarter during Nvidia conference calls. Competition in the AI accelerator ecosystem is intensifying, and Google's approach with Ironwood TPUs demonstrates that alternative paths to traditional GPU supremacy exist.
Conclusion
Google's TPU v7 Ironwood represents a turning point in the AI accelerator market. By combining per-chip performance comparable to Nvidia's most powerful accelerators with scalability capacity far exceeding what the competition offers, Google has proven that accelerator size isn't the only factor that matters. The efficiency with which they can be scaled in production and the ability to adapt to diverse workloads through innovative network topologies like 3D torus and optical circuit switches may prove equally important.
With high-profile customers like Anthropic betting on this technology for their most advanced AI models, it's clear that Google's TPUs are no longer just a niche alternative to Nvidia GPUs, but a competitive strategic choice for those operating at enterprise scale in training and deploying large-scale artificial intelligence models.
FAQ
What are Google's Ironwood TPUs?
Ironwood TPUs (TPU v7) are the seventh generation of AI accelerators developed by Google, designed to train and serve machine learning models with performance comparable to Nvidia Blackwell GPUs.
How many Ironwood TPUs can be connected in a single pod?
A single TPU v7 Ironwood pod can contain from 256 up to 9,216 accelerators, with theoretical possibility to scale up to 43 pods (approximately 400,000 chips) via Google's Jupiter network.
What are the performance specs of TPU v7 Ironwood?
Each Ironwood TPU offers 4.6 petaFLOPS of FP8 compute, 192 GB of HBM3e memory with 7.4 TB/s bandwidth, and 9.6 Tbps of chip-to-chip interconnect via four ICI Links.
How does Google's TPU network topology differ from Nvidia?
Google uses a 3D torus mesh topology with optical circuit switches (OCS), while Nvidia employs packet switches in flatter architectures, ensuring at most two hops between GPUs.
Why does Anthropic use Google's TPUs?
Anthropic announced using up to one million TPUs for its Claude models, benefiting from the extreme scalability and competitive performance offered by Google accelerators.
Do Ironwood TPUs represent a threat to Nvidia?
Yes, TPU v7 Ironwood competes directly with Nvidia GPUs in per-chip performance and exceeds Nvidia in the ability to scale accelerators into unified compute domains of much larger sizes.
What is optical circuit switching technology in TPUs?
Optical circuit switches (OCS) are devices that physically connect TPUs like telephone switchboards, eliminating packet switch latency and improving fault tolerance.
When will TPU v7 Ironwood be available?
Google announced that TPU v7 Ironwood will be available for general use in the coming weeks, making this new generation of accelerators accessible to cloud customers.