Amazon EC2 Inf2 instances, optimized for generative AI, are now generally available
Today, AWS announces the general availability of Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances. These instances deliver high performance at the lowest cost in Amazon EC2 for generative AI models including large language models (LLMs) and vision transformers. Inf2 instances are powered by up to 12 AWS Inferentia2 chips, the latest AWS designed deep learning (DL) accelerator. They deliver up to four times higher throughput and up to 10 times lower latency than first-generation Amazon EC2 Inf1 instances.
 
You can use Inf2 instances to run popular applications such as text summarization, code generation, video and image generation, speech recognition, personalization, and more. Inf2 instances are the first inference-optimized instances in Amazon EC2 to introduce scale-out distributed inference supported by NeuronLink, a high-speed, nonblocking interconnect. You can now efficiently deploy models with hundreds of billions of parameters across multiple accelerators on Inf2 instances. Inf2 instances deliver up to three times higher throughput, up to eight times lower latency, and up to 40% better price performance than other comparable Amazon EC2 instances. To help you meet your sustainability goals, Inf2 instances offer up to 50% better performance per watt compared to other comparable Amazon EC2 instances.
 
Inf2 instances offer up to 2.3 petaflops of DL performance and up to 384 GB of total accelerator memory with 9.8 TB/s bandwidth. AWS Neuron SDK integrates natively with popular machine learning frameworks, such as PyTorch and TensorFlow. So, you can continue using your existing frameworks and application code to deploy on Inf2. Developers can get started with Inf2 instances using AWS Deep Learning AMIs, AWS Deep Learning Containers, or managed services such as Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker.
 
Inf2 instances are available in four sizes: inf2.xlarge, inf2.8xlarge, inf2.24xlarge, inf2.48xlarge in the following AWS Regions as On-Demand Instances, Reserved Instances, and Spot Instances, or as part of a Savings Plan: US East (N. Virginia) and US East (Ohio).
To learn more about Inf2 instances, see the Amazon EC2 Inf2 Instances webpage and the AWS Neuron Documentation.