Amazon Bedrock Model Distillation
Overview
                   With Amazon Bedrock Model Distillation, you can use smaller, faster, more cost-effective models that deliver use-case specific accuracy that is comparable to the most advanced models in Amazon Bedrock. Distilled models in Amazon Bedrock are up to 500% faster and up to 75% less expensive than original models, with less than 2% accuracy loss for use cases like RAG. 
                 
 
                Utilize smaller, more cost-effective models
                   With Model Distillation, customers can select a ‘teacher’ model whose accuracy they want to achieve for their use-case and then select a ‘student’ model that they want to fine-tune. Customers also provide prompts for their use-case. Model Distillation automates the process of generating responses from the teacher and using those responses to fine-tune the student model. Student models can then behave like teacher models with similar accuracy at reduced costs. Model Distillation supports a variety of models from different model providers, including Amazon Nova Premier (teacher) and Nova Pro (student), Claude 3.5 Sonnet v2 (teacher), Llama 3.3 70B (teacher) and Llama 3.2 1B/3B (student). Specific custom models can be invoked via on-demand inference, helping reduce the need for always-on infrastructure. Please refer to model list 
                  here. 
                 
 
                 
 
                Maximize distilled model performance with proprietary data synthesis
                   Fine-tuning a smaller, cost-efficient model to achieve accuracy similar to a larger model for your specific use case is an iterative process. To remove some of the burden of iteration needed to achieve better results, Model Distillation may choose to apply different data synthesis methods that are best suited for your use-case. For example, Bedrock may expand the training dataset by generating similar prompts or generate high-quality synthetic responses using customer provided prompt-response pairs as golden examples. 
                 
 
                 
 
                Reduce cost by easily bringing your production data
                   With traditional fine-tuning, customers are required to create prompts and responses. With Model Distillation, customers only need to provide prompts, which Model Distillation then uses to generate synthetic responses and fine-tune the student models. Customers can direct us to their invocation logs and also filter out the logs based on certain metadata fields. Model distillation can read both prompts and responses via invocation logs and skip synthetic response generation in the Model Distillation workflow, which reduces cost by not having to generate responses from the teacher model again. Get started with 
                  code samples. 
                 
 
                 
 
                Boost function calling prediction accuracy for Agents
                   Agent function calling represents a critical capability for modern AI applications, allowing models to interact with external tools, databases, and APIs by accurately determining when and how to invoke specific functions. While larger models typically excel at identifying the appropriate functions to call and constructing proper parameters, they typically come with higher costs and latency. Amazon Bedrock Model Distillation helps enable smaller models to predict function calling accurately to help deliver substantially faster response times and lower operational costs.