This Guidance demonstrates how to streamline access to numerous large language models (LLMs) through a unified, industry-standard API gateway based on OpenAI API standards. By deploying this Guidance, you can simplify integration while gaining access to tools that track LLM usage, manage costs, and implement crucial governance features. This allows easy switching between models, efficient management of multiple LLM services within applications, and robust control over security and expenses.

Note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • LiteLLM application logs are stored in S3 buckets for audit and analysis purposes. Amazon ECS and Amazon EKS feature built-in tools and plugins to monitor health and performance of their respective clusters, streaming log data to Amazon CloudWatch for event data analysis. These managed services reduce the operational burden of deploying and maintaining application platform infrastructure. CloudWatch Logs provide comprehensive insights into both infrastructure and application levels of Amazon ECS and Amazon EKS clusters, enabling effective troubleshooting and analysis.

    Read the Operational Excellence whitepaper 
  • ACM provides managed SSL/TLS certificates for secure communication and automatically manages these certificates to prevent vulnerabilities. AWS WAF protects web applications from common exploits and provides real-time monitoring and custom rule creation capabilities. Additionally, Amazon ECS and Amazon EKS clusters operate with public and private networks for additional security and isolation. AWS Identity and Access Management (IAM) roles and policies follow the least-privilege principle for both deployment of the Guidance and cluster operations, while Secrets Manager stores external model provider credentials and other sensitive settings securely.

    Read the Security whitepaper 
  • Amazon ECS and Amazon EKS provide container orchestration, automatically handling task placement and recovery across multiple Availability Zones for LiteLLM proxy and API/middleware containers. Amazon ElastiCache enables multi-tenant distribution of application settings and prompt caching. Together, these services enable highly available applications that can maintain operational SLAs even if individual components fail, offering auto-recovery capabilities.

    Read the Reliability whitepaper 
  • ElastiCache enhances performance by providing sub-millisecond latency for frequently accessed data through in-memory caching. ALB effectively distributes incoming application traffic across multiple targets based on advanced routing rules and health checks. Amazon ECS on Fargate and Amazon EKS provide on-demand efficient infrastructure for running application containers, offering auto-scaling based on workload demands. The native integration of LiteLLM with ElastiCache and Amazon RDS significantly reduces database load and improves application response times by serving cached content and efficiently routing requests.

    Read the Performance Efficiency whitepaper 
  • Amazon RDS offers automated backups, patching, and scaling, reducing operational overhead and cost of operation. These services provide options for reserved instances or savings plans, allowing you to significantly reduce costs for predictable workloads compared to on-demand pricing. Amazon ECS and Amazon EKS allow you to run containers on efficient compute Amazon Elastic Compute Cloud (Amazon EC2) instances, such as AWS Graviton, or in a serverless Fargate infrastructure. This helps optimize compute costs by right-sizing resources and only paying for what you use.

    Read the Cost Optimization whitepaper 
  • Amazon EKS and Amazon ECS container orchestration engines enable multiple applications to share underlying compute resources (including efficient compute EC2 instances), maximizing resource utilization and reducing idle capacity. As a managed service, Amazon Bedrock eliminates the need for dedicated GPU infrastructure by sharing pre-trained models across multiple users. This shared infrastructure approach reduces the overall hardware resource footprint and energy consumption compared to running separate dedicated environments.

    Read the Sustainability whitepaper 
Workshop

Guidance for Multi-Provider Generative AI Gateway on AWS

This workshop provides an overview of Guidance for Multi-Provider Generative AI Gateway on AWS, its reference architecture and components, considerations for planning the deployment, and configuration steps for deploying the Guidance.

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?