AWS Cloud Operations Blog
Using Amazon Bedrock and Amazon Nova for AI-Powered Incident Response
In today’s cloud-native world, incident response teams face overwhelming challenges. When critical applications fail, engineers must sift through mountains of observability data across multiple services; all while under intense pressure to restore service quickly. This manual correlation process is time-consuming, error-prone, and often delays resolution, resulting in extended outages and frustrated customers. Traditional monitoring tools alert you to problems but leave complex analysis to humans, creating a significant operational bottleneck.
In this blog, you will learn how to use Amazon Bedrock and Amazon Nova Pro to create an AI-powered incident response system. Amazon Bedrock offers a flexible foundation for building tailored generative AI solutions that can enhance incident response processes by incorporating AWS observability tools data, third-party data sources, and application architecture diagrams. Amazon Bedrock offers a serverless API that gives you access to state-of-the-art foundation models, while Amazon Nova Pro provides advanced multimodal capabilities that can process both text and visual data simultaneously.
By combining these powerful AI services with AWS observability tools, you can develop a system that automatically ingests Amazon CloudWatch metrics, AWS Config changes, AWS X-Ray traces, and architecture diagrams to provide comprehensive incident analysis. The solution not only identifies potential causes of outages but also ranks them by probability, suggests specific troubleshooting steps, and even generates appropriate customer communications; all without requiring deep expertise in machine learning or data science.
Solution overview

Figure 1 – Solution Architecture
At a high level, the solution works through the following process.
- Data collection: Collect and correlate data from infrastructure and observability data sources such as Amazon CloudWatch, AWS Config, and AWS X-Ray.
- Data storage: When an incident occurs, the `fetch-obsv-data.sh` script captures relevant data for a specific duration during the outage and stores it in an Amazon Simple Storage Service (Amazon S3) bucket for analysis.
- AI analysis: The `bedrock-demo-nova-pro.py` script invokes Amazon Bedrock with Amazon Nova Pro to process both textual and visual data, applying advanced AI reasoning to understand the system state.
- Insight generation and resolution: The AI model produces comprehensive insights including ranked probable causes, specific troubleshooting steps, and suggested customer communications. Operations teams leverage these AI-generated insights to quickly implement fixes and restore service, dramatically reducing the Mean Time to Resolution.
Prerequisites
- An AWS account with access to Amazon Bedrock and Amazon Nova Pro.
- An IAM user/role with required permissions to access Amazon Bedrock, CloudWatch, AWS Config, AWS X-Ray, and Amazon S3.
- The AWS CLI configured with appropriate credentials.
- Python 3.x installed locally.
- boto3 Python library installed.
- jq (JSON processor) for processing observability data.
- Amazon S3 bucket to store observability data and architecture diagrams.
Walk-through
This walk-through utilizes PetShop as a sample application. For PetShop deployment, refer to the One Observability Workshop which provides comprehensive guidance on setting up the PetShop sample application.
Step 1: Clone the repository and set up your environment
- Clone the github repository.
git clone https://github.com/aws-samples/sample-aiops-nova-demo.git
- Navigate to the project directory.
cd sample-aiops-nova-demo.git
- Install the required Python packages.
pip install boto3 botocore
- Install jq.
sudo apt-get install jq
(Linux)
Use the jq website for binaries and installation instructions for different operating systems
Step 2: Create an Amazon S3 Bucket for storing observability data
- Replace ‘your-region’ and ‘your-unique-bucket-name’ with your values.
aws s3 mb s3://your-unique-bucket-name --region your-region
Step 3: Upload your architecture diagram
- Upload your application architecture diagram to the Amazon S3 bucket.
aws s3 cp app_diagram.png s3://your-unique-bucket-name/
Step 4: When an incident occurs, run the fetch script to collect data:
- To simulate an outage, modify the security group attached to the elastic load balancer (ELB) by changing the inbound rule from HTTP (port 80) to a different unused port number. This will effectively block incoming traffic to the PetSite application.
chmod +x fetch-obsv-data.sh
./fetch-obsv-data.sh your-region your-unique-bucket-name
This script will:
- Run cwreport.py to collect CloudWatch metrics.
- Query AWS Config for configuration changes.
- Extract AWS X-Ray traces for the application.
- Upload all data to your Amazon S3 bucket.
Step 5: Analyze the incident with Amazon Bedrock
- Run the Amazon Bedrock script to analysis the collected data.
python bedrock-demo-nova-pro.py your-region your-unique-bucket-name
This script will:
- Download the data from Amazon S3.
- Construct a prompt for Amazon Nova Pro that includes all data sources.
- Invoke Amazon Bedrock with multimodal input.
- Process and display the AI-generated insights.
Step 6: Review AI recommendations
The output provided will include:
- Ranked list of probable incident causes
- Analysis of recent configuration changes
- Specific troubleshooting steps
- Suggested customer communications
See the output-example.txt file to see an Amazon Nova Pro model sample response. This approach transforms traditional incident management by automating the most time-consuming aspects of troubleshooting while providing clear, actionable guidance to your operations team.
Resources
You can access the code from this blog at https://github.com/aws-samples/sample-aiops-nova-demo.
Cleaning up
To avoid ongoing charges in your AWS account, you should delete any AWS resources created in following this blog post.
Conclusion
As we’ve demonstrated throughout this blog, combining AWS observability services with generative AI creates a powerful new paradigm for incident response. By automating the analysis of complex, multi-dimensional data, you can dramatically reduce MTTR while improving the quality of their incident communications. This approach doesn’t just solve today’s operational challenges; it scales to meet the growing complexity of modern cloud architecture.
The solution we’ve built represents just the beginning of what’s possible. As foundation models continue to evolve, their ability to understand complex systems and provide actionable insights will only improve. Organizations that embrace these technologies now will be well-positioned to maintain reliable services even as their infrastructure grows in complexity.
Here’s how to get started:
- Get started with AWS Observability solutions: Enhance your observability foundation by implementing comprehensive monitoring to ensure you’re capturing the data needed for effective analysis.
- Explore Amazon Bedrock and its foundation model offerings—particularly Amazon Nova Pro’s multimodal capabilities that can process both text and visual information simultaneously.
- Join the AWS Generative AI community to stay updated on the latest advancements and best practices for applying these technologies to operational challenges.
- See the One Observability Workshop that provides a hands-on experience for the wide variety of toolsets AWS offers to setup monitoring and observability of your applications.
Acknowledgment: Special thanks to Katreena Mullican (former AWS employee) for her contributions in making this project successful.