Optimize for cost, latency, and accuracy
Boost accuracy and performance while controlling cost with tools to evaluate and optimize AI at every stage
Never compromise on performance
Amazon Bedrock helps you strike the right balance between cost, latency, and accuracy—so your generative AI applications perform efficiently without overspending. With features like Model Distillation, Intelligent Prompt Routing, prompt caching, and flexible inference options including on-demand, batch, and provisioned throughput, Bedrock gives you the control to optimize across use cases and scale with confidence. Whether you're serving real-time or batch workloads, Bedrock lets you build smarter, leaner, and more cost-effective AI systems.

Improve performance. Reduce costs.
Use prompt caching to reduce costs up to 90% and latency up to 85% for supported models
Many foundation model (FM) use cases will reuse certain portions of prompts (prefixes) across API calls. With prompt caching, supported models will let you cache these repeated prompt prefixes between requests. This cache lets the model skip recomputation of matching prefixes.
- Improve performance for multiple use cases
- Cache the relevant portions of your prompt to save on input token costs
- Integrate with other Amazon Bedrock features to accelerate multi-step tasks or longer system prompts to help refine agent behavior without slowing your responses down

Accelerate prompt engineering for generative AI applications
Amazon Bedrock Prompt Management simplifies the creation, evaluation, versioning, and running of prompts to enable developers get the best responses from foundation models for their use cases.
- Prompt Management lets you test different foundation models, configurations, tools, and instructions
- Prompt optimization in Prompt Management automatically rewrites prompts to improve accuracy and provide more concise responses from foundation models
- Test prompts with the latest foundation models instantly without any deployment
- Quickly build generative AI applications and collaborate on prompt creation in Amazon SageMaker Unified Studio

Maximize performance at lower cost with Intelligent Prompt Routing
Amazon Bedrock Intelligent Prompt Routing routes prompts to different foundation models within a model family, helping you optimize for quality of responses and cost. Intelligent Prompt Routing can reduce costs by up to 30% without compromising on accuracy.
- Bedrock will dynamically route requests to the model that it predicts is most likely to give the desired response at the lowest cost
- Reduce your development effort and by testing different models and creating complex orchestration workflows by selecting default prompt routers provided by Amazon Bedrock, or by configuring your own
- Easily debug with fully traceable requests

Distilled models in Amazon Bedrock are up to 500% faster and up to 75% less expensive than original models, with less than 2% accuracy loss for use cases like RAG
Use smaller, faster, more cost-effective models that deliver use-case specific accuracy—comparable to the most advanced models in Amazon Bedrock.
Fine tune a ‘student’ model with a ‘teacher’ model that has the accuracy you want
- Maximize distilled model performance with proprietary data synthesis
- Reduce cost by bringing your production data. Model Distillation lets you provide prompts, and then uses to generate synthetic responses and fine tune the student models
- Boost function calling prediction accuracy for agents. Enable smaller models to predict function calling accurately to help deliver substantially faster response times and lower operational costs

Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages