![TPU 8t and TPU 8i]()
Google Cloud has announced its eighth generation of custom Tensor Processor Units (TPUs), introducing two distinct chips purpose-built to handle the unique demands of the "agentic era." As AI moves from single-turn chatbots to autonomous agents that reason, plan, and execute multi-step workflows, Google is separating its infrastructure to optimize for two fundamentally different workloads: Training and Inference.
Two Specialized Chips
TPU 8t (Training Powerhouse): Engineered to reduce the development cycle for frontier models from months to weeks. It features massive scale-up bandwidth, 121 ExaFlops of compute per superpod, and 10x faster storage access to maximize system utilization. It is built to achieve near-linear scaling for up to a million chips in a single cluster.
TPU 8i (Reasoning Engine): Optimized for latency-sensitive, high-throughput inference—the "heartbeat" of AI agents. It pairs 288 GB of high-bandwidth memory (HBM) with 3x more on-chip SRAM to keep active model working sets on-chip, effectively breaking the "memory wall" and enabling swarms of agents to run in real-time.
Engineering for the Agentic Era
Google has redesigned the TPU stack to eliminate the "waiting room" effect where agents sit idle while waiting for data:
Virgo Network & Boardfly Topology: These new networking and interconnect architectures reduce network diameter by 50% and double the Interconnect (ICI) bandwidth, ensuring that multi-agent "swarms" function as a single, low-latency unit.
Axion-Powered Efficiency: For the first time, both chips run on Google’s custom Axion Arm-based CPUs, allowing for full-stack optimization and superior energy efficiency (2x better performance-per-watt over the previous generation, "Ironwood").
On-Chip Acceleration: A new Collectives Acceleration Engine (CAE) offloads global operations, reducing on-chip latency by up to 5x.
Reliability at Scale
Recognizing that downtime is the enemy of frontier-scale training, TPU 8t is engineered to target over 97% "goodput" (productive compute time). This is achieved through:
Real-time Telemetry: Granular monitoring across tens of thousands of chips.
Automatic Rerouting: The ability to detect and bypass faulty links without interrupting live training jobs.
Optical Circuit Switching (OCS): Hardware that reconfigures around failures autonomously, without human intervention.
Availability and Ecosystem
Open Software Support: Both chips run natively on the frameworks developers already use: JAX, MaxText, PyTorch, SGLang, and vLLM.
Launch Timeline: Both TPU 8t and 8i will be generally available later this year as part of Google’s AI Hypercomputer stack.
This launch represents a foundational shift. As businesses move from "simple AI" to "agentic AI" that executes continuous loops of reasoning, the bottleneck shifts from simple raw compute to memory bandwidth and interconnect latency. By specializing silicon for these specific agentic bottlenecks, Google is setting the stage for a new generation of autonomous applications that are as fast as they are capable. Interested developers can request more information on the Google Cloud website.