[SEO Subhead]
This Guidance demonstrates how to streamline access to numerous large language models (LLMs) through a unified, industry-standard API gateway based on OpenAI API standards. By deploying this Guidance, you can simplify integration while gaining access to tools that track LLM usage, manage costs, and implement crucial governance features. This allows easy switching between models, efficient management of multiple LLM services within applications, and robust control over security and expenses.
Note: [Disclaimer]
Architecture Diagram

[Architecture diagram description]
Step 1
Tenants and client applications access the LiteLLM gateway proxy API through the Amazon Route 53 URL endpoint or Amazon CloudFront, which is protected against common web exploits and bots using AWS WAF.
Step 2
AWS WAF forwards requests to Application Load Balancer (ALB) to automatically distribute incoming application traffic to Amazon Elastic Container Service (Amazon ECS) tasks or Amazon Elastic Kubernetes Service (Amazon EKS) pods running generative AI gateway containers. TLS/SSL encryption secures traffic to the load balancer using a certificate issued by AWS Certificate Manager (ACM).
Step 3
Container images for API/middleware and LiteLLM applications are built during guidance deployment and pushed to Amazon Elastic Container Registry (Amazon ECR). They are used for deployment to Amazon ECS on AWS Fargate or Amazon EKS clusters that run these applications as containers in ECS tasks or EKS pods, respectively.
Step 3 (continued)
LiteLLM provides a unified application interface for configuration and interacting with LLM providers. The API/middleware integrates natively with Amazon Bedrock to enable features not supported by the LiteLLM open source project.
Step 4
Models hosted on Amazon Bedrock and Amazon Nova provide model access, guardrails, prompt caching, and routing to enhance the AI gateway and additional controls for clients through a unified API. Access to required Amazon Bedrock models must be properly configured.
Step 5
External model providers (such as OpenAI, Anthropic, or Vertex AI) are configured using the LiteLLM Admin UI to enable additional model access through LiteLLM’s unified application interface. Integrate pre-existing configurations of third-party providers into the gateway using LiteLLM APIs.
Step 6
LiteLLM integrates with Amazon ElastiCache (Redis OSS), Amazon Relational Database Service (Amazon RDS), and AWS Secrets Manager services. Amazon ElastiCache enables multi-tenant distribution of application settings and prompt caching. Amazon RDS enables persistence of virtual API keys and other configuration settings provided by LiteLLM. Secrets Manager stores external model provider credentials and other sensitive settings securely.
Step 7
LiteLLM and the API/middleware store application sends logs to the dedicated Amazon Simple Storage Service (Amazon S3) storage bucket for troubleshooting and access analysis.
Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
LiteLLM application logs are stored in S3 buckets for audit and analysis purposes. Amazon ECS and Amazon EKS feature built-in tools and plugins to monitor health and performance of their respective clusters, streaming log data to Amazon CloudWatch for event data analysis. These managed services reduce the operational burden of deploying and maintaining application platform infrastructure. CloudWatch Logs provide comprehensive insights into both infrastructure and application levels of Amazon ECS and Amazon EKS clusters, enabling effective troubleshooting and analysis.
-
Security
ACM provides managed SSL/TLS certificates for secure communication and automatically manages these certificates to prevent vulnerabilities. AWS WAF protects web applications from common exploits and provides real-time monitoring and custom rule creation capabilities. Additionally, Amazon ECS and Amazon EKS clusters operate with public and private networks for additional security and isolation. AWS Identity and Access Management (IAM) roles and policies follow the least-privilege principle for both deployment of the Guidance and cluster operations, while Secrets Manager stores external model provider credentials and other sensitive settings securely.
-
Reliability
Amazon ECS and Amazon EKS provide container orchestration, automatically handling task placement and recovery across multiple Availability Zones for LiteLLM proxy and API/middleware containers. Amazon ElastiCache enables multi-tenant distribution of application settings and prompt caching. Together, these services enable highly available applications that can maintain operational SLAs even if individual components fail, offering auto-recovery capabilities.
-
Performance Efficiency
ElastiCache enhances performance by providing sub-millisecond latency for frequently accessed data through in-memory caching. ALB effectively distributes incoming application traffic across multiple targets based on advanced routing rules and health checks. Amazon ECS on Fargate and Amazon EKS provide on-demand efficient infrastructure for running application containers, offering auto-scaling based on workload demands. The native integration of LiteLLM with ElastiCache and Amazon RDS significantly reduces database load and improves application response times by serving cached content and efficiently routing requests.
-
Cost Optimization
Amazon RDS offers automated backups, patching, and scaling, reducing operational overhead and cost of operation. These services provide options for reserved instances or savings plans, allowing you to significantly reduce costs for predictable workloads compared to on-demand pricing. Amazon ECS and Amazon EKS allow you to run containers on efficient compute Amazon Elastic Compute Cloud (Amazon EC2) instances, such as AWS Graviton, or in a serverless Fargate infrastructure. This helps optimize compute costs by right-sizing resources and only paying for what you use.
-
Sustainability
Amazon EKS and Amazon ECS container orchestration engines enable multiple applications to share underlying compute resources (including efficient compute EC2 instances), maximizing resource utilization and reducing idle capacity. As a managed service, Amazon Bedrock eliminates the need for dedicated GPU infrastructure by sharing pre-trained models across multiple users. This shared infrastructure approach reduces the overall hardware resource footprint and energy consumption compared to running separate dedicated environments.
Related Content

Guidance for Multi-Provider Generative AI Gateway on AWS
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.