Skip to main content

AWS AI Chips

AWS Trainium

Trainium — purpose-built for high-performance, cost-efficient AI at scale

Why Trainium?

AWS Trainium is a family of purpose-built AI accelerators — Trainium1, Trainium2, and Trainium3 — designed to deliver scalable performance and cost efficiency for training and inference across a broad range of generative AI workloads.

The AWS Trainium Family

Trainium1

The first-generation AWS Trainium chip powers Amazon Elastic Compute Cloud (Amazon EC2) Trn1 instances, which have up to 50% lower training costs than comparable Amazon EC2 instances. Many customers, including Ricoh, Karakuri, SplashMusic, and Arcee AI, are realizing performance and cost benefits of Trn1 instances.

Trainium2

AWS Trainium2 chip delivers up to 4x the performance of first-generation Trainium. Trainium2-based Amazon EC2 Trn2 instances and Trn2 UltraServers, are purpose-built for generative AI and offer 30-40% better price performance than GPU-based EC2 P5e and P5en instances. Trn2 instances feature up to 16 Trainium2 chips, and Trn2 UltraServers feature up to 64 Trainium2 chips interconnected with NeuronLink, our proprietary chip-to-chip interconnect. You can use Trn2 instances and UltraServers to train and deploy the most demanding models including large language models (LLMs), multi-modal models, and diffusion transformers, to build a broad set of next-generation generative AI applications.

Trainium3

AWS’s first 3nm AI chip—is purpose-built to deliver the best token economics for next-generation agentic, reasoning, and video generation applications. AWS Trainium3 chip provides 2x higher compute performance to 2.52 petaflops (PFLOPs) of FP8 compute, increases the memory capacity by 1.5x and bandwidth by 1.7x over Trainium2 to 144 GB of HBM3e memory, and 4.9 TB/s of memory bandwidth. Trn3 UltraServers, powered by Trainium3, deliver up to 4.4x higher performance, 3.9x higher memory bandwidth, and over 4x better energy efficiency compared to Trn2 UltraServers. Trainium3 is designed for both dense and expert-parallel workloads with advanced data types (MXFP8 and MXFP4) and improved memory-to-compute balance for real-time, multimodal, and reasoning tasks.

Built for Developers

New Trainium3 based UltraServers are built for AI researchers and powered by the AWS Neuron SDK, to unlock breakthrough performance.

With native PyTorch integration, developers can train and deploy without changing a single line of code. For AI performance engineers, we’ve enabled deeper access to Trainium3, so developers can fine-tune performance, customize kernels, and push your models even further. Because innovation thrives openness, we are committed to engaging with our developers through open-source tools and resources.

To learn more, visit Amazon EC2 Trn3 UltraServers, and explore AWS Neuron SDK.

Benefits

Trn3 UltraServers feature the latest innovations in scale-up UltraServer technology, with NeuronSwitch-v1 for faster all-to-all collectives across up to 144 Trainium3 chips. Trn3 UltraServer provides up to 20.7 TB of HBM3e, 706 TB/s of memory bandwidth, and 362 MXFP8 PFLOPs, delivering up to 4.4x more performance and over 4x better energy efficiency than Trn2 UltraServers. Trn3 provides the highest performance at the lowest cost for training and inferencing with the latest 1T+ parameter MoE and reasoningtype models, and drives significantly higher throughput for GPT-OSS serving at scale compared to Trainium2-based instances.

Trn2 UltraServers remain a high-performance, cost-effective option for generative AI training and inference of models up to 1T parameters. Trn2 instances feature up to 16 Trainium2 chips, and Trn2 UltraServers feature up to 64 Trainium2 chips connected with NeuronLink, a proprietary chip-to-chip interconnect.

Trn1 instances feature up to 16 Trainium chips and deliver up to 3 FP8 PFLOPs, 512 GB of HBM with 9.8 TB/s of memory bandwidth, and up to 1.6 Tbps of EFA networking.

AWS Neuron SDK helps you extract the full performance from Trn3, Trn2 and Trn1 instances so you can focus on building and deploying models and accelerating your time to market. AWS Neuron integrates natively with , PyTorch Jax, and essential libraries like Hugging Face, vLLM, PyTorch Lightning and others. It optimizes models out of the box for distributed training and inference, while providing deep insights for profiling and debugging. AWS Neuron integrates with services such as Amazon SageMaker, Amazon SageMaker Hyerpod, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), AWS ParallelCluster, and AWS Batch, as well as third- party services like Ray (Anyscale), Domino Data Lab, and Datadog.

To deliver high performance while meeting accuracy goals, AWS Trainium supports a range of mixed precision
data types such as BF16, FP16, FP8, MXFP8 andMXFP4. To support the fast pace of innovation in generative AI,
Trainium2 and Trainium3 feature hardware optimizations for 4x sparsity (16:4), micro-scaling, stochastic
rounding, and dedicated collective engines.

Neuron enables developers to optimize their workloads using Neuron Kernel Interface (NKI) for kernel development. NKI exposes the full Trainium ISA, enabling complete control over instruction-level programming, memory allocation, and execution scheduling. Along with building your own kernels, developers can use the Neuron Kernel Library, which is open sourced, and ready to deploy optimized kernels. And lastly, Neuron Explore provides full stack visibility, connecting developers' code down to engines in hardware.

Customers

Customers such as Anthropic, Decart, poolside, Databricks, Ricoh, Karakuri, SplashMusic, and others, are realizing performance and cost benefits of Trn1, Trn2, and Trn3 instances and UltraServers.

Early adopters of Trn3 are achieving new levels of efficiency and scalability for the next generation of large-scale generative AI models.

Missing alt text value

Conquer AI performance, cost, and scale

AWS Trainium2 for breakthrough AI performance

AWS AI chips customer stories