Google Unveils Ironwood TPUs and Axion for Inference Era

Google Unveils Ironwood, Its ‘Most Powerful’ and ‘Energy-Efficient’ AI Chip to Date

Google Unveils Ironwood, Its ‘Most Powerful’ and ‘Energy-Efficient’ AI Chip to Date

Generated with Google Gemini

Google unveils Ironwood, its most powerful TPU, for the age of inference, and Axion Arm VMs promising up to 2× better price-performance for AI workloads.

Written By
Liz Ticong
Liz Ticong
Nov 7, 2025

Google is turning up the heat in the AI hardware race.

The tech titan has unveiled Ironwood, its newest AI chip described as “its most powerful and energy-efficient to date,” promising a tenfold leap in performance efficiency for large-scale inference and model training.

Announced by Google Cloud executives Amin Vahdat and Mark Lohmeyer, Ironwood TPUs are purpose-built for the most demanding workloads, marking a shift toward what Google calls “the age of inference.”

Inference takes over as AI’s new arena

Google is framing that shift as a turning point for the industry, moving from teaching AI to keeping it running around the clock. In this “age of inference,” the spotlight falls on performance, responsiveness, and the seamless coordination between general-purpose compute and machine learning accelerators.

As models evolve to handle real-time reasoning and decision-making, Google says the next breakthroughs will come from system-level design, rather than just larger datasets or more complex architectures. That philosophy underpins Ironwood: a chip built to power AI that lives in motion.

Pushing AI performance to a new extreme

Google’s new Ironwood TPU is engineered to handle the heaviest AI workloads, from large-scale model training to rapid-fire inference, with a leap in speed and efficiency that redefines its silicon line.

The chip delivers 10× the peak performance of TPU v5p and more than 4× the performance per chip of its predecessor, Trillium (v6e), making it Google’s most advanced processor for both training and serving AI models.

Built with enhanced cooling, reliability, and power efficiency, Ironwood is designed for “planet-scale” deployment, capable of scaling across thousands of chips without losing stability.

Early adopters are already putting that promise to the test. Anthropic plans to tap into up to 1 million TPUs to serve its Claude models, while Lightricks and Essential AI report major boosts in generation quality and training efficiency.

Anthropic Head of Compute James Bradbury said, “Ironwood’s improvements in both inference performance and training scalability will help us scale efficiently while maintaining the speed and reliability our customers expect.”

Advertisement

More Google coverage

Where 9,000 chips think as one

Ironwood doesn’t stand alone — it’s the beating heart of Google’s AI Hypercomputer, a system built to make thousands of processors work together as one.

Each superpod links up to 9,216 TPUs via a 9.6 terabit-per-second network, enabling the chips to communicate almost instantly and operate as a unified system. Together, these pods share 1.77 petabytes of ultra-fast memory, removing the data slowdowns that typically hinder large-scale AI processing.

In practice, this means that enormous models, such as chatbots, image generators, or research systems, can run faster, more efficiently, and without interruption. By enabling thousands of chips to work together seamlessly, Google can deliver faster responses, lower latency, and smoother performance for businesses and developers using its AI infrastructure.

To keep that vast web running smoothly, Google relies on optical circuit switching — a self-healing fabric that reroutes workloads instantly in the event of interruptions. The company says its fleet has maintained 99.999% uptime since 2020, supported by advanced liquid cooling and automated cluster management.

A co-designed software layer, including Kubernetes Cluster Director, MaxText, vLLM, and GKE Inference Gateway, helps squeeze every bit of performance from the hardware, cutting latency and lowering serving costs for customers operating at planetary scale.

Axion steps in where power meets practicality

Alongside Ironwood, Google introduced Axion, its new line of Arm-based CPUs built to power the everyday computing that keeps AI systems running smoothly. The lineup includes the N4A, now in preview, and C4A Metal, coming soon. Both are designed to deliver up to twice the price-performance of comparable x86-based virtual machines.

In simpler terms, they promise more computing power for less cost and energy, making it easier and cheaper for businesses to run the supporting tasks that AI depends on, from data processing and analytics to app hosting and system management.

Companies testing Axion say the improvements are already tangible. Vimeo, for instance, reported a 30% boost in video transcoding performance, and ZoomInfo measured a 60% improvement in price-performance for core data workloads. Rise said the new instances helped cut compute consumption by 20% while maintaining low latency and strong margins.

Ironwood and Axion deliver a one-two punch to Google: raw acceleration for AI at scale, paired with efficient, general-purpose compute for everything surrounding it. It’s a full-stack strategy built for a future where intelligence never pauses, and where the cloud itself learns to think faster.

Google’s latest energy ambitions are just as audacious, with a new plan to harness solar power from orbit to keep its AI infrastructure running.

Liz Ticong

Liz Ticong is a staff writer for eWeek and TechRepublic focused on AI, cybersecurity, enterprise software, and data. She has more than 10 years of editorial experience as a technology industry writer, combining reporting, product research, and hands-on software testing in her coverage. Her work has been published on Datamation, Enterprise Networking Planet, and TechnologyAdvice.com. She writes technology news, software reviews, product comparisons, and buyer’s guides for business and IT readers.