AWS Machine Learning Blog
Build a contextual chatbot application using Amazon Bedrock Knowledge Bases
May 2024: This post was reviewed and updated in to provide the chatbot application’s infrastructure as code using the AWS CDK.
Modern chatbots can serve as digital agents, providing a new avenue for delivering 24/7 customer service and support across many industries. Their popularity stems from the ability to respond to customer inquiries in real time and handle multiple queries simultaneously in different languages. Chatbots also offer valuable data-driven insights into customer behavior while scaling effortlessly as the user base grows; therefore, they present a cost-effective solution for engaging customers. Chatbots are able to use the advanced natural language capabilities of large language models (LLMs) to respond to customer questions. They can understand conversational language and respond naturally. However, chatbots that merely answer basic questions have limited utility. To become trusted advisors, chatbots need to provide thoughtful, tailored responses that can help the end-user fulfill a task.
One way to enable more contextual conversations is by linking the chatbot to internal knowledge bases and information systems. Integrating proprietary enterprise data from internal knowledge bases enables chatbots to contextualize their responses to each user’s individual needs and interests. For example, a chatbot could suggest products that match a shopper’s preferences and past purchases, explain details in language adapted to the user’s level of expertise, or provide account support by accessing the customer’s specific records. The ability to intelligently incorporate information, understand natural language, and provide customized replies in a conversational flow allows chatbots to deliver real business value across diverse use cases.
The popular architecture pattern of Retrieval Augmented Generation (RAG) is often used to augment user query context and responses. RAG combines the capabilities of LLMs with the grounding in facts and real-world knowledge that comes from retrieving relevant texts and passages from a corpus of data. These retrieved texts are then used to inform and ground the output, reducing hallucination and improving relevance.
In this post, we illustrate contextually enhancing a chatbot by using Amazon Bedrock Knowledge Bases, a fully managed serverless service. Amazon Bedrock Knowledge Bases allows your chatbot to provide more relevant, personalized responses by linking user queries to related information data points. Amazon Bedrock Knowledge Bases securely connects foundation models (FMs) to internal company data sources for RAG, to deliver more relevant and accurate responses. All the information retrieved from Amazon Bedrock Knowledge Bases is provided with citations to improve transparency and minimize hallucinations. For this post, we use the Amazon letters to shareholders dataset to develop this solution.
Retrieval Augmented Generation
RAG is an approach to natural language generation that incorporates information retrieval into the generation process. RAG architecture involves two key workflows: data preprocessing through ingestion, and text generation using enhanced context.
The data ingestion workflow uses LLMs to create embedding vectors that represent semantic meaning of texts. Embeddings are created for documents and user questions. The document embeddings are split into chunks and stored as indexes in a vector database. The text generation workflow then takes a question’s embedding vector and uses it to retrieve the most similar document chunks based on vector similarity. It augments prompts with these relevant chunks to generate an answer using the LLM. For more details, refer to the Primer on Retrieval Augmented Generation, Embeddings, and Vector Databases section in Preview – Connect Foundation Models to Your Company Data Sources with Agents for Amazon Bedrock.
The following diagram illustrates the high-level RAG architecture.

Although the RAG architecture has many advantages, it involves multiple components, including a database, retrieval mechanism, prompt, and generative model. Managing these interdependent parts can introduce complexities in system development and deployment. The integration of retrieval and generation also requires additional engineering effort and computational resources. Some open source libraries provide wrappers to reduce this overhead; however, changes to libraries can introduce errors and add additional overhead of versioning. Even with open source libraries, significant effort is required to write code, determine optimal chunk size, generate embeddings, and more. This setup work alone can take weeks depending on data volume.
Therefore, a managed solution that handles these undifferentiated tasks could streamline and accelerate the process of implementing and managing RAG applications.
Amazon Bedrock Knowledge Bases
Amazon Bedrock Knowledge Bases is a serverless option to build powerful conversational artificial intelligence (AI) systems using RAG. It offers fully managed data ingestion and text generation workflows.
For data ingestion, Amazon Bedrock provides the StartIngestionJob API to start an ingestion job. It handles creating, storing, managing, and updating text embeddings of document data in the vector database automatically. It splits the documents into manageable chunks for efficient retrieval. The chunks are then converted to embeddings and written to a vector index, while allowing you to see the source documents when answering a question.
For text generation, Amazon Bedrock provides the RetrieveAndGenerate API to create embeddings of user queries, and retrieves relevant chunks from the vector database to generate accurate responses. It also supports source attribution and short-term memory needed for RAG applications.
This enables you to focus on your core business applications and removes the undifferentiated heavy lifting.
Solution overview
The solution presented in this post uses a chatbot application built using the following solution architecture.

This architecture workflow includes the following steps:
- A user uploads the Amazon letters to shareholders dataset to an Amazon Simple Storage Service (Amazon S3) bucket set up as the knowledge base data source.
- Amazon S3 invokes an AWS Lambda function so the Lambda function can synchronize the data source with the knowledge base.
- The Lambda function starts data ingestion by calling the StartIngestionJob
- The knowledge base splits the documents in the data source into manageable chunks for efficient retrieval.
- The knowledge base is set up to use Amazon OpenSearch Serverless as its vector store and an Amazon Titan embedding text model to create the embeddings. In this step, Amazon Bedrock Knowledge Bases converts the chunks to embeddings and writes to a vector index in the OpenSearch vector store, while maintaining a mapping to the original document. For more information about vector stores, see Set up a vector index for your knowledge base in a supported vector store. For more information about supported embedding models, see Supported regions and models for Amazon Bedrock Knowledge Bases.
- A user interacts with the chatbot interface and submit a query in natural language. The chatbot frontend application is a single page application built using the React
- This invokes a REST API created using Amazon API Gateway. A Lambda function integrated with the API invokes the RetrieveAndGenerateAPI for its response.
- Amazon Bedrock Knowledge Bases uses the Amazon Titan embedding model, converts the user query to a vector, and finds chunks that are semantically similar to the user query. The user prompt is then augmented with the chunks that are retrieved from the knowledge base. The prompt alongside the additional context is then sent to an LLM for response generation. In this solution, we use Anthropic Claude Instant 1.2 as our LLM to generate user responses using additional context. Claude Instant v1.2 is a fast, affordable, and very capable model that can handle a range of tasks, including casual dialogue, text analysis, summarization, and document question-answering.
- The Lambda function returns the answer and the citation as part of the response.
- The user sees an answer and the citation on the chatbot user interface.
Prerequisites
To set up this solution, complete the following prerequisites:
- Amazon Bedrock users need to request access to FMs before they are available for use. This is a one-time action and takes less than a minute. For this solution, you’ll need to enable access to the Amazon Titan Embeddings G1 – Text and Claude Instant 1.2 model in Amazon Bedrock. For more information, refer to Model access.
- Pick an AWS Region from the Amazon Bedrock supported Regions.
- If you plan to deploy the solution from your local machine, make sure you complete the following steps: 
         - Install and configure the AWS Command Line Interface (AWS CLI).
- Install and bootstrap the AWS Cloud Development Kit (AWS CDK).
- Download, install, and use a long-term Node.js 20.x version.
- Have Docker installed and running.
 
An AWS Cloud9 integrated development environment (IDE) comes pre-installed with the AWS CLI and AWS CDK tools.
Clone the GitHub repo
The solution presented in this post is available in the following GitHub repo. You need to clone the GitHub repository to your local machine. Open a terminal window and run the following command (this is a single git clone command):
Deploy the solution
Complete the following steps to deploy the solution:
- From the command line, navigate to the backendfolder using the following command:
- Install the dependencies using the following command:
- Use the AWS CDK to deploy the backend of the chatbot application using the following command:
Provide a chatbot client IP address that is allowed to access the API Gateway in CIDR format as part of the allowedip context variable.
When the deployment is complete, use the outputs as shown in the following screenshot and note the API Gateway URL and DocsBucketName values:
- Note the API Gateway URL shown as APIGatewayUrloutput
- Note the S3 bucket name shown as DocsBucketNameoutput

The chatbot application backend deployed a knowledge base and S3 data source using resources from the AWS Generative AI Constructs Library for Amazon Bedrock. The AWS Generative AI Constructs Library is an open source extension of the AWS CDK that provides multi-service, well-architected patterns for quickly defining solutions in code to create predictable and repeatable infrastructure, called constructs. The goal of the AWS Generative AI CDK Constructs Library is to help developers build generative AI solutions using pattern-based definitions for their architecture.
Upload your knowledge dataset to Amazon S3
We download the dataset for our knowledge base and upload it into a S3 bucket. This dataset will feed and power knowledge base. Complete the following steps:
- Navigate to the Annual reports, proxies and shareholder letters data repository and download the last few years of Amazon shareholder letters.
  
- On the Amazon S3 console, choose Buckets in the navigation pane.
- Navigate to the bucket name shown in the BackendStack.DocsBucketNameoutput value.
- Upload dataset files you downloaded to this bucket and choose Upload.
Test the contextual chatbot application
To test your chatbot application, complete the following steps:
- Open a new terminal or command line window on your machine.
- Navigate to the root folder called amazon-bedrock-ragand change the directory to thefrontend
- Install the dependencies using the npm installcommand.
- Use npm run startto launch the chatbot application user interface in your browser.
- On the user interface, for Step 1 – Enter API URL, enter the API Gateway endpoint URL noted previously as part of the BackendStack.APIGatewayUrloutput.
- Enter your question for Step 2 and press Enter to receive an answer and a citation.

The following table includes some sample questions and related knowledge base responses. Try out some of these questions by using prompts.
| Questions | Answers | 
| What is Amazon doing in the field of generative AI? | Amazon has been working on their own large language models (LLMs) for generative AI and believes it will transform and improve every customer experience. They plan to continue investing substantially in these models across all their consumer, seller, brand, and creator experiences. | 
| What is AWS year-over-year revenue in 2022? | AWS revenue grew 29% year-over-year in 2022 on a $62 billion revenue base. | 
| How many days has Amazon asked employees to come to work in office? | Amazon has asked corporate employees to come back to office at least three days a week beginning May 2022. | 
| By what percentage did AWS revenue grow year-over-year in 2022? | AWS had a 29% year-over-year (‘YoY’) revenue in 2022. | 
| Compared to Graviton2 processors, what performance improvement did Graviton3 chips deliver according to the passage? | In 2022, AWS delivered their Graviton3 chips, providing 25% better performance than the Graviton2 processors. | 
| Which was the first inference chip launched by AWS according to the passage? | AWS launched their first inference chips (“Inferentia”) in 2019, and they have saved companies like Amazon over a hundred million dollars in capital expense. | 
| According to the context, in what year did Amazon’s annual revenue increase from $245B to $434B? | Amazon’s annual revenue increased from $245B in 2019 to $434B in 2022. | 
| Tell me again what was the revenue in 2019? | Amazon’s revenue in 2019 was $245 billion. | 
| and, 2021? | Amazon’s revenue in 2021 was $469.8 billion, an increase of 22% over 2020. | 
| And, remind me again when was the first inference chip was launched? | Amazon’s first inference chip was Inferentia, which launched in 2019. | 
During the first call to the Lambda function, the RetrieveAndGenerate API returns a sessionId, which is then passed by the React app along with the subsequent user prompt as an input to the RetrieveAndGenerate API to continue the conversation in the same session. The RetrieveAndGenerate API manages the short-term memory and uses the chat history as long as the same sessionId is passed as an input in the successive calls.
Congratulations, you have successfully created and tested a chatbot application using Amazon Bedrock Knowledge Bases.
Clean up
Failing to delete resources such as the S3 bucket, OpenSearch Serverless collection, and knowledge base will incur charges. To clean up these resources, run the following command from the project’s folder called amazon-bedrock-rag/backend:
Conclusion
In this post, we provided an overview of contextual chatbots and explained why they’re important. We described the complexities involved in data ingestion and text generation workflows for a RAG architecture. We then introduced how Amazon Bedrock Knowledge Bases creates a fully managed serverless RAG system, including a vector store. Finally, we provided a solution architecture and sample code in a GitHub repo to retrieve and generate contextual responses for a chatbot application using a knowledge base.
By explaining the value of contextual chatbots, the challenges of RAG systems, and how Amazon Bedrock Knowledge Bases addresses those challenges, this post aimed to showcase how Amazon Bedrock enables you to build sophisticated conversational AI applications with minimal effort.
For more information, see the Amazon Bedrock Developer Guide and Knowledge Base APIs.
About the Authors
 Manish Chugh is a Principal Solutions Architect at AWS based in San Francisco, CA. He specializes in machine learning and generative AI. He works with organizations ranging from large enterprises to early-stage startups on problems related to machine learning. His role involves helping these organizations architect scalable, secure, and cost-effective workloads on AWS. He regularly presents at AWS conferences and other partner events. Outside of work, he enjoys hiking on East Bay trails, road biking, and watching (and playing) cricket.
Manish Chugh is a Principal Solutions Architect at AWS based in San Francisco, CA. He specializes in machine learning and generative AI. He works with organizations ranging from large enterprises to early-stage startups on problems related to machine learning. His role involves helping these organizations architect scalable, secure, and cost-effective workloads on AWS. He regularly presents at AWS conferences and other partner events. Outside of work, he enjoys hiking on East Bay trails, road biking, and watching (and playing) cricket.
 Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.
Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.
 Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.
Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.
 Anand Komandooru is a Principal Cloud Architect at AWS. He joined AWS Professional Services organization in 2021 and helps customers build cloud-native applications on AWS cloud. He has over 20 years of experience building software and his favorite Amazon leadership principle is “Leaders are right a lot.“
 Anand Komandooru is a Principal Cloud Architect at AWS. He joined AWS Professional Services organization in 2021 and helps customers build cloud-native applications on AWS cloud. He has over 20 years of experience building software and his favorite Amazon leadership principle is “Leaders are right a lot.“
 Fabiano Meneses is a Principal Cloud Application Architect with AWS Professional Services. He is a highly passionate IT professional with over 25 years of international experience in designing and implementing solutions to deliver business outcomes for customers. His current focus is building cloud-native distributed systems, with a keen interest in serverless technologies.
Fabiano Meneses is a Principal Cloud Application Architect with AWS Professional Services. He is a highly passionate IT professional with over 25 years of international experience in designing and implementing solutions to deliver business outcomes for customers. His current focus is building cloud-native distributed systems, with a keen interest in serverless technologies.