Containers
Introducing AI on EKS: powering scalable AI workloads with Amazon EKS
This blog post was jointly authored by Vara Bonthu, Principal OSS Specialist Solutions Architect and Omri Shiv, Senior Open Source ML Engineer
Introduction
We’re excited to announce the launch of AI on EKS: a new open source initiative from Amazon Web Services (AWS) designed to help customers deploy, scale, and optimize artificial intelligence/machine learning (AI/ML) workloads on Amazon Elastic Kubernetes Service (Amazon EKS). AI on EKS provides deployment-ready blueprints facilitating training, fine-tuning, and inference of multiple large language models (LLMs), infrastructure as code (IaC) templates to enable reference architectures as well as customizable platforms, benchmarks comparing different training and deployment strategies, and best practices for tasks such as model training, inference, fine-tuning, multi-model serving, and more.
This new project builds on the popular Data on EKS (DoEKS) initiative, which, since its launch in 2023, has supported a broad spectrum of data and AI workloads. Customers have used DoEKS to run distributed data processing with Apache Spark and Apache Flink, stream processing with Apache Kafka, SQL analytics with Trino, orchestration with Apache Airflow, and AI/ML workloads such as model training with Ray and serving with vLLM—all using Kubernetes-native best practices on AWS.
To support the growing demand for more specialized AI/ML solutions and enable faster innovation, we have split the project into two focused and complementary GitHub repositories:
- AI on EKS is the single repository for AI/ML blueprints for EKS. This includes distributed training, LLM inference, generative AI pipelines, multi-model serving, Agentic AI, GPU and Neuron-specific benchmarks, and MLOps best practices.
- Data on EKS continues to serve as the go-to resource for data analytics, distributed computing, and database workload patterns. This includes Spark, Flink, Trino, Airflow, Kafka, ClickHouse, and PostgreSQL.
This separation enables us to target our focus and maintainability of AI/ML and data through increased focus of contributions, while keeping interoperability between data and AI workloads running on Amazon EKS.
Figure 1: AI on EKS website
What is AI on EKS?
AI on EKS is an open source project hosted in the AWS Labs GitHub organization that delivers modular infrastructure Terraform templates and curated deployment blueprints to run scalable AI workloads on Amazon EKS. It enables platform teams/MLOps teams to set up the infrastructure, while data scientists and ML engineers can focus on model development and experimentation, abstracting away the complexity of infrastructure setup, model deployment, load balancing, autoscaling, storage, and performance tuning.
As the demand for AI/ML workloads on Kubernetes continues to rise, organizations are encountering a new set of infrastructure and operational challenges. Although Amazon EKS provides a robust foundation, deploying AI systems at scale—especially those involving large models, accelerators, and distributed processing—necessitates more specialized capabilities. Teams often struggle with managing complex GPU infrastructure, rising costs from inefficient scaling, and integrating fragmented open source tools that don’t always work seamlessly together. AI on EKS aims to address these challenges with deeply integrated blueprints, best practices, and IaC templates for the following:
- Provisioning EKS clusters: Creating secure and scalable EKS clusters with support for multiple accelerators for AI/ML workloads.
- Reference architectures: Tested architectures enable quickly starting with AI/ML workloads using a proven environment that supports the workload.
- Composable environments: Composable environment support configuring components to enable custom architectures with a foundational infrastructure that enables a wide range of workloads.
- Optimizing Kubernetes for accelerators: Installing and configuring the essential components needed to run AI/ML training and inference.
- Using EKS-optimized AMIs and container images: Using EKS-optimized Amazon Machine Images (AMI) for GPU and Neuron workloads, with guidance on choosing compatible container images for PyTorch, TensorFlow, DeepSpeed, Nemo, Hugging Face Transformers, and more.
- Scalable model inference and multi-model deployment: Deploying LLM and multi-model inference workloads using frameworks such as Ray Serve, NVIDIA Triton, AIBrix, vLLM, and NVIDIA Dynamo, with support for multi-model caching, autoscaling, and low-latency serving.
- Distributed training at scale: Running high-performance training jobs across multiple nodes using frameworks such as Ray, PyTorch, DeepSpeed, or NVIDIA NeMo, with support for tensor/pipeline parallelism, multi-threaded model downloading, shared file systems (such as EFS, FSx for Lustre), and data locality optimization.
- GPU optimization techniques: Applying advanced GPU management techniques such as Multi-Instance GPU (MIG), K8s Dynamic Resource Allocation (DRA) and time-slicing to improve usage, performance isolation, and cost efficiency across shared workloads.
- Observability for AI infrastructure: Implementing real-time observability for GPU and Neuron-backed workloads—such as training, inference, and multi-model endpoints—using Prometheus, Grafana, Amazon CloudWatch, and custom exporters for resource metrics, logs, and latency analysis.
- Cost optimization and monitoring: Enabling visibility and control over resource usage with strategies such as spot instance integration, autoscaling policies, and right-sizing recommendations for AI workloads to minimize operational costs.
- Custom schedulers for AI workloads: Managing distributed training and inference workloads with custom Kubernetes schedulers such as Kueue, Volcano, Apache YuniKorn, and Leader Worker Set (LWS). This supports queueing, resource borrowing, gang scheduling, and workload-aware prioritization.
Project structure
The project is organized into the following categories:
1. Infra
This section provides modular Terraform templates designed to deploy and configure the foundational infrastructure for scalable AI/ML workloads on Amazon EKS. Key components include the following:
- Deploy GPU or Neuron-accelerated EKS clusters using either the traditional provisioning model or the newly introduced EKS Automode, which streamlines cluster creation with built-in best practices and optimized defaults. The deployment option through EKS Automode is coming soon.
- Set up Karpenter for intelligent auto scaling.
- Job schedulers and custom Kubernetes schedulers (for example Volcano, Kueue, Argo Workflows).
- Integrate with observability and cost stacks (Prometheus, Grafana, Amazon Managed Service for Prometheus, Amazon Managed Grafana, FluentBit, KubeCost, AWS SCAD).
2. Blueprints
The Blueprints section provides prebuilt reference implementations designed for specific AI/ML use cases:
- Training: Ray on EKS, PyTorch DDP training with FSx for Lustre
- Inference: Ray with vLLM, NVIDIA Triton Server with TensorRT
- Multi-model Serving: NVIDIA Triton with shared Amazon EFS caching
- Agentic AI & RAG: RAG pipelines using LlamaIndex, LangChain, and vector stores such as PGVector (coming soon)
Each blueprint provides a complete documentation package that includes architectural design specifications, detailed deployment instructions, comprehensive observability and cost metrics, performance benchmarks for the models, and recommended best practices for deployment on Amazon EKS.
How to deploy
To deploy one of the blueprints (for example Mistral-7B-Instruct-v0.2 with RayServe and vLLM), follow these steps:
Clone the ai-on-eks repository that contains the Terraform code for this deployment pattern:
Navigate to the JARK deployment directory and run the install script to deploy the infrastructure:
This deployment takes approximately 15-20 minutes to complete.
When the installation finishes, you can find the configured kubectl command from the output. Run the following to configure EKS cluster access:
You’re now ready to deploy your first model workload using one of the available blueprints.
Deploying Mistral-7B-Instruct-v0.2 with RayServe and vLLM:
Getting involved
AI on EKS provides practical starting points for running AI/ML workloads on Kubernetes. These blueprints help reduce time spent on infrastructure setup, making it easier to get from experimentation to deployment. They also support cost-aware scaling with examples for GPU scheduling, observability, and resource monitoring. AI on EKS shares configurations for commonly used ML frameworks and accelerators. This allows teams to get started without needing to build everything from scratch.
This project is built by AWS Solutions Architects and contributors from the open source Kubernetes and AI/ML community. It is developed in the open, and we actively welcome feedback, contributions, and new ideas. If you’re running AI/ML workloads on Kubernetes or exploring LLMs, RAG, or fine-tuning on Amazon EKS, then your experience and input are invaluable.
Ways to contribute:
- Submit issues and feature requests through GitHub.
- Create a PR for new blueprint when the issue is approved by the maintainers.
- Join community discussions at the CNCF #ai-on-eks Slack channel and shape the roadmap.
Please review the CONTRIBUTING.md guide for more details.
Next steps
To get started with AI on EKS:
- Visit the AI on EKS website for documentation, blueprints, and architecture guidance.
- Explore the AI on EKS GitHub repo.
- Try out a blueprint and start running AI workloads on Amazon EKS.
AI on EKS brings together cloud-native design, open source flexibility, and AWS operational excellence to help you build, scale, and optimize AI systems with confidence.
Contributors
The team behind the AI on EKS project: Vara Bonthu, Omri Shiv, Apoorva Kulkarni, Manabu McCloskey, Ratnopam Chakrabarti, Lucas Soriano Alves Duarte, Shivam Dubey, Tiago Reichert, Alan Halcyon, Sheetal Joshi, Vicky Whittingham, Christina Andonov, Ramya Sarangarajan, Aditya Ramakrishnan, Divya Gupta, Premdass Ravidass, Vincent Wang, Vijoy Choyi, Arun Nalpet Ramakrishna, Aastha Varma, Scott Perry with the support from the Amazon EKS and AWS Neuron product teams!