Skip to main content

Amazon EC2 P6e-GB200 UltraServers and P6-B200 instances

The highest GPU performance for AI training and inference

Why Amazon EC2 P6e-GB200 UltraServers and P6-B200 instances?

Amazon Elastic Compute Cloud (Amazon EC2) P6e-GB200 UltraServers, accelerated by NVIDIA GB200 NVL72, offer the highest GPU performance in Amazon EC2. They feature over 20x compute and over 11x memory under NVIDIA NVLinkTM compared to P5en instances. P6e-GB200 UltraServers are ideal for the most compute- and memory-intensive AI workloads, such as training and deploying frontier models at the trillion-parameter scale.

Amazon EC2 P6-B200 instances, accelerated by NVIDIA Blackwell GPUs, are an ideal option for medium-to-large scale training and inference applications. They offer up to 2x performance compared to P5en instances for AI training and inference.

P6e-GB200 UltraServers and P6-B200 instances enable faster training for next-generation AI models and improve performance for real-time inference in production. You can use P6e-GB200 UltraServers and P6-B200 instances to train frontier foundation models (FMs) such as mixture of experts and reasoning models and deploy them in generative and agentic AI applications such as content generation, enterprise copilots, and deep research agents.

Benefits

With P6e-GB200 UltraServers, customers can access up to 72 Blackwell GPUs within one NVLink domain to use 360 petaflops of FP8 compute (without sparsity) and 13.4 TB of total high- bandwidth memory (HBM3e). P6e-GB200 UltraServers provide up to 130 terabytes per second of low-latency NVLink connectivity between GPUs and up to 28.8 terabits per second of total Elastic Fabric Adapter networking (EFAv4) for AI training and inference. This UltraServer architecture on P6e-GB200 enables customers to leverage a step change improvement in compute and memory, with up to 20x GPU TFLOPS, 11x GPU memory, and 15x aggregate GPU memory bandwidth under NVLink compared to P5en.

P6-B200 instances provide 8x NVIDIA Blackwell GPUs with 1440 GB of high bandwidth GPU memory, 5th Generation Intel Xeon Scalable processors (Emerald Rapids), 2 TiB of system memory, up to 14.4 TB/s of total bidirectional NVLink bandwidth, and 30 TB of local NVMe storage. These instances feature up to 2.25x GPU TFLOPs, 1.27x GPU memory size, and 1.6x GPU memory bandwidth compared to P5en instances.

 

P6e-GB200 UltraServers and P6-B200 instances are powered by the AWS Nitro System with specialized hardware and firmware designed to enforce restrictions so that no one, including anyone at AWS, can access your sensitive AI workloads and data. The Nitro System, which handles networking, storage, and other I/O functions, can deploy firmware updates, bug fixes, and optimizations while it remains operational. This increases stability and reduces downtime, which is critical to meeting training timelines and running AI applications in production.

To enable efficient distributed training, P6e-GB200 UltraServers and P6-B200 instances use fourth-generation Elastic Fabric Adapter networking (EFAv4). EFAv4 uses Scalable Reliable Datagram protocol to intelligently route traffic across multiple network paths to maintain smooth operation even during congestion or failures.

P6e-GB200 UltraServers and P6-B200 instances are deployed in Amazon EC2 UltraClusters that enable scaling up to tens of thousands of GPUs within a petabit-scale nonblocking network.

Features

Each NVIDIA Blackwell GPU features a second-generation Transformer Engine and supports new precision formats such as FP4. It supports fifth-generation NVLink, a faster, wider interconnect delivering up to 1.8 TB/s of bandwidth per GPU.

The Grace Blackwell Superchip, a key component of P6e-GB200, connects two high- performance NVIDIA Blackwell GPUs and an NVIDIA Grace CPU using the NVIDIA NVLink-C2C interconnect. Each Superchip delivers 10 petaflops of FP8 compute (without sparsity) and up to 372 GB of HBM3e. With the superchip architecture, 2 GPUs and 1 CPU are co-located within one compute module, increasing bandwidth between GPU and CPU by an order of magnitude compared to current generation P5en instances.

P6e-GB200 UltraServers and P6-B200 instances provide 400 GB per second per GPU of EFAv4 networking for a total of 28.8 Tbps per P6e-GB200 UltraServer and 3.2 Tbps per P6-B200 instance.

P6e-GB200 UltraServers and P6-B200 instances support Amazon FSx for Lustre file systems so you can access data at hundreds of GB/s of throughput and millions of IOPS required for large- scale AI training and inference. P6e-GB200 UltraServers support up to 405 TB of local NVMe SSD storage while P6-B200 instances support up to 30 TB of local NVMe SSD storage for fast access to large datasets. You can also use virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).

Product Details

Instance types

Instance Size
Blackwell GPUs
GPU memory (GB)
vCPUs
System memory (GiB)
Instance storage (TB)
Network bandwidth (Gbps)
EBS bandwidth (Gbps)
Available in EC2 UltraServers
p6-b200.48xlarge

8

1,440 HBM3e

192

2048

8 x 3.84

8 x 400

100

No

p6e- gb200.36xlarge

4

740 HBM3e

144

960

3 x 7.5

4 x 400

60

Yes

P6e-GB200 instances are only available in UltraServers

UltraServer types

Instance Size
Blackwell GPUs
GPU memory (GB)
vCPUs
System memory (GiB)
UltraServer Storage (TB)
Aggregate EFA bandwidth (Gbps)
EBS bandwidth (Gbps)
Available in EC2 UltraServers
u-p6e-gb200x72

72

13320

2529

17280

405

28800

1080

Yes

u-p6e-gb200x36

36

6660

1296

8640

202.5

14400

540

Yes

Customer testimonials

Here are some examples of how customers and partners have achieved their business goals with Amazon EC2 P6e-GB200 UltraServers and P6-B200 instances.

JetBrains

We are extensively using Amazon EC2 P5en instances, and are excited about the launch of the P6 and P6e instances featuring NVIDIA Blackwell GPUs, which promise substantial performance improvements. Preliminary out of the box evaluations indicated >85% faster training times on P6-B200 over H200 based P5en instances across our ML pipelines, with further optimizations expected to deliver even greater gains. This advancement will help us build outstanding products for our customers.

Vladislav Tankov, Director of AI, JetBrains

Missing alt text value

Getting started with ML use cases

Amazon SageMaker is a fully managed service for building, training, and deploying ML models. With Amazon SageMaker HyperPod (P6-B200 support coming soon), you can more easily scale to tens, hundreds, or thousands of GPUs to train a model quickly at any scale without worrying about setting up and managing resilient training clusters.

AWS Deep Learning AMIs (DLAMI) provides ML practitioners and researchers with the infrastructure and tools to accelerate DL in the cloud, at any scale. AWS Deep Learning Containers are Docker images preinstalled with DL frameworks to streamline the deployment of custom ML environments by letting you skip the complicated process of building and optimizing your environments from scratch.

If you prefer to manage your own containerized workloads through container orchestration services, you can deploy P6-B200 instances with Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS).

P6e-GB200 UltraServers will also be available through NVIDA DGX Cloud, a fully managed environment with NVIDIA’s complete AI software stack. You get NVIDIA’s latest optimizations, benchmarking recipes, and technical expertise.

Learn more