AWS Cloud Operations Blog

Enhance your AIOps: Introducing Amazon CloudWatch and Application Signals MCP servers

Modern architectures generate vast amounts of observability data across metrics, logs, and traces. When issues arise, teams spend hours—sometimes days—manually correlating information across multiple dashboards to identify root causes, directly impacting MTTR and productivity. Amazon CloudWatch Application Signals addresses this challenge by providing deep application visibility through automatic instrumentation, capturing key metrics like latency, error rates, request volumes, and distributed traces. While its intuitive interface accelerates troubleshooting by surfacing critical insights and correlations, we can enhance this capability further.

By leveraging generative AI to augment this powerful toolset, we can identify root causes even faster. This is where the Model Context Protocol (MCP) by Anthropic comes in—an open-source protocol that standardizes how applications provide context to Large Language Models (LLMs). MCP transforms troubleshooting of complex systems by connecting observability data directly to AI models, enabling intelligent, context-aware analysis that significantly reduces investigation time.

We recently launched two new MCP servers for Amazon CloudWatch and Application Signals. Amazon CloudWatch MCP server serves as a unified platform for interacting with CloudWatch’s robust suite of monitoring and observability tools. It enables alarm-based incident response, alarm recommendation, metric and log analysis, and log pattern detection and more. Complementing the CloudWatch MCP server, Application Signals MCP server focuses on service health monitoring, analyzing performance metrics, tracking Service Level Objectives (SLOs) compliance, and investigate issues using distributed tracing. These MCP servers can be seamlessly integrated with various AI assistants, including Amazon Q, Claude Code, GitHub Copilot, and others, enabling natural language interactions with your observability data.

In this blog post, we’ll demonstrate how to leverage the Amazon Q Developer CLI with these MCP servers to transform your operational workflows. You’ll learn how to identify performance bottlenecks, resolve permissions issues, optimize alarm configurations, and accelerate incident remediation – all through intuitive, conversation-style interactions that replace traditional manual efforts.

Prerequisites

  1. An AWS account with applications ingesting telemetry (metrics, traces, and logs) to Amazon CloudWatch
  2. Enable Application Signals for your applications
  3. Configure AWS credentials with the minimum permissions required for the CloudWatch and Application Signals MCP servers to securely access and interact with your AWS resources. Follow the principle of least privilege when granting permissions, providing only the necessary access for the MCP servers to query CloudWatch metrics, logs, and alarms, as well as access Application Signals data.

Setting up the Environment

Before proceeding with the setup, it’s crucial to have a well-configured observability setup. Here are some best practices to follow:

  1. Activate CloudWatch Alarms: Ensure you have active CloudWatch alarms, as they provide valuable context for the Amazon Q CLI to understand and respond to your queries effectively. For guidance on creating CloudWatch alarms, refer to the CloudWatch Alarms documentation.
  2. Define SLOs in Application Signals: After enabling Application Signals, define Service Level Objectives (SLOs) to gain deeper insights into your application’s performance and behavior. To learn more, please see How to monitor application health using SLOs with Amazon CloudWatch Application Signals.
  3. Send CloudTrail Events to CloudWatch Log Group: Integrating CloudTrail with CloudWatch Log Groups will enable the Amazon Q CLI to access a comprehensive view of your infrastructure, further enhancing its ability to provide accurate and contextual responses. To learn more, refer to the Sending CloudTrail events to CloudWatch Logs.

By following these best practices, you’ll ensure that the Amazon Q Developer CLI has access to the necessary telemetry data and can provide you with accurate and context-aware responses when troubleshooting and analyzing your AWS resources.

Set up the Amazon Q Developer CLI

  1. Install Amazon Q Developer CLI on your system
  2. Install the uv utility from Astral or the GitHub README
  3. Use the uv utility to install Python version 3.10

uv python install 3.10

Configuring MCP Servers

  1. Configure MCP server. Amazon Q Developer CLI supports two levels of MCP configuration
    1. Global Configuration: ~/.aws/amazonq/mcp.json – Applies to all workspaces
    2. Workspace Configuration: .amazonq/mcp.json – Specific to the current workspace
  2. Choose your preferred configuration level and add the below CloudWatch and Application Signals MCP server configuration to the corresponding mcp.json file. Replace the AWS_PROFILE and AWS_REGION placeholders with your specific AWS profile and region.
{
  "mcpServers": {
    "awslabs.cloudwatch-mcp-server": {
      "autoApprove": [],
      "disabled": false,
      "command": "uvx",
      "args": [
        "awslabs.cloudwatch-mcp-server@latest"
      ],
      "env": {
        "AWS_PROFILE": "Add your AWS Profile",
        "AWS_REGION": "Add your AWS Region",
        "FASTMCP_LOG_LEVEL": "ERROR"
      },
      "transportType": "stdio"
    },
    "awslabs.cloudwatch-appsignals-mcp-server": {
      "autoApprove": [],
      "disabled": false,
      "command": "uvx",
      "args": [
        "awslabs.cloudwatch-appsignals-mcp-server@latest"
      ],
      "env": {
        "AWS_PROFILE": "Add your AWS Profile",
        "AWS_REGION": "Add your AWS Region",
        "FASTMCP_LOG_LEVEL": "ERROR"
      },
      "transportType": "stdio"
    }
  }
}

Now that you have the Amazon Q CLI installed, the AWS credentials configured, and the MCP servers set up, you can start using the CloudWatch and Application Signals MCP servers to troubleshoot and analyze your AWS resources through natural language queries.

Interacting with Amazon Q CLI

  1. Start the conversation with command q chat
  2. Verify MCP server configuration
    1. Run /mcp command to confirm MCP servers are loaded as shown in Figure 1.
      Figure 1. Validate MCP servers have loaded
    2. Use /tools command to view available tools and capabilities as shown in Figure 2.
      Figure 2. List of available tools
  3. Explore available features by asking What questions can I ask about CloudWatch or Application Signals MCP Servers? to understand the full range of capabilities and possible queries, as shown in Figure 3.
    Figure 3:- Discover capabilities of CloudWatch and Application Signals MCP servers

Real-World Use case – Identifying and Resolving Permission Issues

Scenario

A DevOps team is alerted with multiple faults in their critical ordering-service, causing potential disruption to business operations. The team needs to quickly:

  1. Identify the root cause of the faults
  2. Determine when the issue started
  3. Find out who made the changes that caused the issue
  4. Implement the necessary fixes

Traditional Approach

Troubleshooting permission issues often involves tedious log analysis, trial-and-error testing, and deep dives into IAM policies. Even with complete knowledge of an application architecture, this can be time-consuming and frustrating.

Intelligent Troubleshooting with Amazon Q CLI

Step 1: Identify Root Cause

We’ll begin by asking Amazon Q CLI to “review my ordering-service and provide remediation steps and an RCA for the cause of the faults”.

Amazon Q CLI leverages the Application Signals MCP server to deliver comprehensive troubleshooting capabilities through an intelligent, automated approach. The system performs real-time analysis of service health metrics, examines fault patterns and error messages, and precisely identifies permission-related failures as shown in Figure 4.

Figure 4. Ask Amazon Q CLI to identify the cause of the issue

Upon completing this analysis, it provides users with detailed remediation instructions, a thorough root cause analysis that explains any permission gaps, and a complete assessment of operational impacts as shown in Figure 5.

Figure 5 – Output of the Q CLI showing the RCA and Remediation Steps

This sophisticated AI-driven methodology not only dramatically reduces resolution time but also equips teams with valuable insights to prevent similar issues from occurring in the future, making it an invaluable tool for modern DevOps environments.

Step 2: Track Changes

Next, we’ll identify the exact time and identity that performed the change. We’ll ask Amazon Q CLI to “identify when and who changed the permissions on the role”.

Through its intelligent decision-making capabilities, Amazon Q CLI smartly selects the most efficient tools available for each task. In this case, Amazon Q CLI leverages its built-in use_aws tool as shown in Figure 6 to automatically analyze CloudTrail events, creating detailed timelines of role modifications, pinpointing specific changes, and identifying the individuals responsible for those changes along with precise timestamps. This automated analysis generates a comprehensive audit trail of permission changes, allowing teams to quickly identify root causes of permission-related issues without the need for manual log investigation, significantly streamlining the troubleshooting process.

Figure 6. Asking Amazon Q CLI to identify when and who changed permissions

Step 3: Implement Fix

Now that we have identified the cause, when and who caused it, we need to resolve the permissions change. Manually updating IAM policies requires careful syntax and a deep understanding of least-privilege principles. There’s also the risk of introducing new vulnerabilities if not done correctly. We’ll ask Amazon Q CLI to “Fix the permissions issue”.

Amazon Q CLI adds the missing permissions to the service role, restoring the ordering-service back to its previous state. Through a guided remediation process with built-in security safeguards and validation procedures, this systematic approach ensures efficient implementation while maintaining security best practices and reducing human error risk.

Figure 7. Asking Amazon Q CLI to fix the permissions issue

The following video demonstrates the complete workflow from investigation to resolution

Figure 8. Full investigation and remediation using Amazon Q CLI and the CloudWatch and Application Signals MCP servers.

Common Investigation Sample Queries

Here are some example queries you can use with Amazon Q CLI to leverage CloudWatch and Application Signals MCP servers:

  1. Advanced SLO Analysis – “My payment-service SLO is breached – perform a complete root cause analysis including which specific operations are failing, what the error patterns are in the logs, and provide actionable remediation steps”
  2. Service Dependencies – “Map out the complete request flow for user checkout transactions, identify bottlenecks across all services, and show me where the highest latency is introduced in the chain”
  3. Performance Optimization – “Show me how my AI/ML service token usage patterns correlate with latency spikes, and identify which models are causing the most performance issues”
  4. Error Investigation – “Find all distributed transaction failures across my microservices in the last 24 hours, group them by root cause, and show me the customer impact of each failure type”
  5. Predictive Analysis – “Analyze seasonal patterns in my service performance over the last 3 months, predict when I’ll hit capacity limits, and recommend scaling strategies”
  6. Security Analysis – “Investigate suspicious traffic patterns by analyzing traces with unusual latency signatures, correlating with security logs, and identifying potential attack vectors”

These prompts demonstrate how Amazon Q CLI can help you investigate complex operational scenarios, analyze performance patterns, and get actionable insights for your AWS resources.

Conclusion

In this blog, we showed you how Amazon CloudWatch and Application Signals MCP servers enhance your operational workflow through four key benefits: context-aware search capabilities, natural language queries, interactive troubleshooting workflows, and streamlined developer experience. These features work together to help you identify issues faster, reduce routine task time, and improve operational efficiency while reducing incident resolution time.

To explore these capabilities further, check out the GitHub repository for Amazon CloudWatch and Application Signals MCP servers. For more information about implementing MCP servers on AWS, visit Harness the power of MCP servers with Amazon Bedrock Agents and Unlocking the power of Model Context Protocol (MCP) on AWS. To learn more about AWS Observability best practices, we recommend visiting AWS Observability Best Practices guide and the One Observability Workshop

Raviteja Sunkavalli

Raviteja Sunkavalli

Raviteja Sunkavalli is a Senior Worldwide Specialist Solutions Architect at Amazon Web Services, specializing in observability and incident management. He helps global customers in implementing comprehensive monitoring and remediation solutions to streamline their cloud operations and enhance resilience. Outside of work, Ravi enjoys playing cricket and exploring new cooking recipes.

Joe Alioto

Joe Alioto

Joe is a Senior Specialist Solutions Architect for Cloud Operations focusing on Observability, Governance, and Centralized Operations Management on AWS. He has over two decades of hands-on operations engineering and architecture experience. When he isn't working, he enjoys spending time with his family, learning new technologies and pc gaming.

Matheus Arrais

Matheus Arrais

Matheus Arrais is the WW Tech Leader for Cloud Operations at AWS. He is responsible for the global direction of an internal community of hundreds of AWS experts focused on the operational capabilities of AWS. Matheus works closely with the AWS service teams to design solutions at scale that help customers implement and support complex cloud infrastructure. Find on LinkedIn: https://www.linkedin.com/in/matheusarrais/