Amazon Bedrock

Optimize for cost, latency, and accuracy

Boost accuracy and performance while controlling cost with tools to evaluate and optimize AI

Never compromise on performance

Amazon Bedrock helps you strike the right balance between cost, latency, and accuracy—so your generative AI applications perform efficiently without overspending. With features like Model Distillation, Intelligent Prompt Routing, prompt caching, and flexible inference options including on-demand, batch, and provisioned throughput, Amazon Bedrock gives you the control to optimize across use cases and scale with confidence. Whether you're serving real-time or batch workloads, Amazon Bedrock lets you build smarter, leaner, and more cost-effective AI systems.

Improve performance. Reduce costs.

Use prompt caching to reduce costs up to 90% and latency up to 85% for supported models

Many foundation model (FM) use cases will reuse certain portions of prompts (prefixes) across API calls. With prompt caching, supported models will let you cache these repeated prompt prefixes between requests. This cache lets the model skip recomputation of matching prefixes.

Improve performance for multiple use cases
Cache the relevant portions of your prompt to save on input token costs
Integrate with other Amazon Bedrock features to accelerate multi-step tasks or longer system prompts to help refine agent behavior without slowing your responses down

Learn more about prompt caching

Accelerate prompt engineering for generative AI applications

Amazon Bedrock Prompt Management simplifies the creation, evaluation, versioning, and running of prompts to enable developers get the best responses from FMs for their use cases.

Prompt Management lets you test different FMs, configurations, tools, and instructions
Prompt optimization in Prompt Management automatically rewrites prompts to improve accuracy and provide more concise responses from FMs
Test prompts with the latest FMs instantly without any deployment
Quickly build generative AI applications and collaborate on prompt creation in Amazon SageMaker Unified Studio

Watch the demo

Maximize performance at lower cost with Intelligent Prompt Routing

Amazon Bedrock Intelligent Prompt Routing routes prompts to different FMs within a model family, helping you optimize for quality of responses and cost. Intelligent Prompt Routing can reduce costs by up to 30% without compromising on accuracy.

Amazon Bedrock will dynamically route requests to the model that it predicts is most likely to give the desired response at the lowest cost
Reduce your development effort and test different models and create complex orchestration workflows by selecting default prompt routers provided by Amazon Bedrock, or by configuring your own
Easily debug with fully traceable requests

Learn more about prompt routing

Distilled models in Amazon Bedrock are up to 500% faster and up to 75% less expensive than original models, with less than 2% accuracy loss for use cases like RAG

Use smaller, faster, more cost-effective models that deliver use case–specific accuracy—comparable to the most advanced models in Amazon Bedrock.

Fine tune a ‘student’ model with a ‘teacher’ model that has the accuracy you want.

Maximize distilled model performance with proprietary data synthesis
Reduce cost by bringing your production data. Model Distillation lets you provide prompts, and then uses them to generate synthetic responses and fine-tune the student models
Boost function calling prediction accuracy for agents. Enable smaller models to predict function calling accurately to help deliver substantially faster response times and lower operational costs

Learn more about model distillation