Google is turning up the heat in the AI hardware race.
The tech titan has unveiled Ironwood, its newest AI chip described as “its most powerful and energy-efficient to date,” promising a tenfold leap in performance efficiency for large-scale inference and model training.
Announced by Google Cloud executives Amin Vahdat and Mark Lohmeyer, Ironwood TPUs are purpose-built for the most demanding workloads, marking a shift toward what Google calls “the age of inference.”
Inference takes over as AI’s new arena
Google is framing that shift as a turning point for the industry, moving from teaching AI to keeping it running around the clock. In this “age of inference,” the spotlight falls on performance, responsiveness, and the seamless coordination between general-purpose compute and machine learning accelerators.
As models evolve to handle real-time reasoning and decision-making, Google says the next breakthroughs will come from system-level design, rather than just larger datasets or more complex architectures. That philosophy underpins Ironwood: a chip built to power AI that lives in motion.
Pushing AI performance to a new extreme
Google’s new Ironwood TPU is engineered to handle the heaviest AI workloads, from large-scale model training to rapid-fire inference, with a leap in speed and efficiency that redefines its silicon line.
The chip delivers 10× the peak performance of TPU v5p and more than 4× the performance per chip of its predecessor, Trillium (v6e), making it Google’s most advanced processor for both training and serving AI models.
Built with enhanced cooling, reliability, and power efficiency, Ironwood is designed for “planet-scale” deployment, capable of scaling across thousands of chips without losing stability.
Early adopters are already putting that promise to the test. Anthropic plans to tap into up to 1 million TPUs to serve its Claude models, while Lightricks and Essential AI report major boosts in generation quality and training efficiency.
Anthropic Head of Compute James Bradbury said, “Ironwood’s improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect.”
More Google coverage
- New Google Search AI Mode is ‘Total Reimagining,’ Says CEO Sundar Pichai
- In Major Ruling, Judge Finds Google ‘Willfully Acquired and Maintained Monopoly Power’ Over Digital Ad Market
- Google’s Big Bet on Nuclear Energy: ‘The Race to Power AI-Driven Data Centers is Accelerating’
- Computer History Museum Releases Original AlexNet Code: Why It Matters
Where 9,000 chips think as one
Ironwood doesn’t stand alone — it’s the beating heart of Google’s AI Hypercomputer, a system built to make thousands of processors work together as one.
Each superpod links up to 9,216 TPUs via a 9.6 terabit-per-second network, enabling the chips to communicate almost instantly and operate as a unified system. Together, these pods share 1.77 petabytes of ultra-fast memory, removing the data slowdowns that typically hinder large-scale AI processing.
In practice, this means that enormous models, such as chatbots, image generators, or research systems, can run faster, more efficiently, and without interruption. By enabling thousands of chips to work together seamlessly, Google can deliver faster responses, lower latency, and smoother performance for businesses and developers using its AI infrastructure.
To keep that vast web running smoothly, Google relies on optical circuit switching — a self-healing fabric that reroutes workloads instantly in the event of interruptions. The company says its fleet has maintained 99.999% uptime since 2020, supported by advanced liquid cooling and automated cluster management.
A co-designed software layer, including Kubernetes Cluster Director, MaxText, vLLM, and GKE Inference Gateway, helps squeeze every bit of performance from the hardware, cutting latency and lowering serving costs for customers operating at planetary scale.
Axion steps in where power meets practicality
Alongside Ironwood, Google introduced Axion, its new line of Arm-based CPUs built to power the everyday computing that keeps AI systems running smoothly. The lineup includes the N4A, now in preview, and C4A Metal, coming soon. Both are designed to deliver up to twice the price-performance of comparable x86-based virtual machines.
In simpler terms, they promise more computing power for less cost and energy, making it easier and cheaper for businesses to run the supporting tasks that AI depends on, from data processing and analytics to app hosting and system management.
Companies testing Axion say the improvements are already tangible. Vimeo, for instance, reported a 30% boost in video transcoding performance, and ZoomInfo measured a 60% improvement in price-performance for core data workloads. Rise said the new instances helped cut compute consumption by 20% while maintaining low latency and strong margins.
Ironwood and Axion deliver a one-two punch to Google: raw acceleration for AI at scale, paired with efficient, general-purpose compute for everything surrounding it. It’s a full-stack strategy built for a future where intelligence never pauses, and where the cloud itself learns to think faster.
Google’s latest energy ambitions are just as audacious, with a new plan to harness solar power from orbit to keep its AI infrastructure running.