AWS Cloud Operations Blog
Innovation sandbox on AWS with real-time analytics dashboard
How do you deploy hundreds of AWS accounts for a large-scale hackathon? Provide real-time visibility to leadership? Enable participant self-service while monitoring spending across accounts? Enterprise innovation events often lack real-time visibility into participant engagement, resource utilization, and outcomes. Leaders can’t see engagement metrics; builders can’t access accounts and information on-demand. Without observability and governance, […]
Investigating Service Issues with Amazon CloudWatch Application Signals Custom Metrics
When a critical service fails, you need to know how much revenue you’re losing, not just that latency has increased. This post shows you how to integrate business metrics with CloudWatch Application Signals to see both technical performance and business impact in one unified view. With CloudWatch Application Signals, you can view metrics, traces, and […]
Cross-Region AWS PrivateLink monitoring with Amazon CloudWatch Network Synthetic Monitor
Introduction Global, distributed AWS architectures are the backbone for customers seeking high availability, resilience, and regulatory compliance. Workloads are commonly deployed across multiple AWS Regions and Availability Zones (AZs), often using AWS PrivateLink to connect services securely and privately across Amazon Virtual Private Cloud (Amazon VPC) networks. This approach enhances security and separation while requiring […]
Alerting Best Practices with Amazon Managed Service for Prometheus
Introduction Alerts connect telemetry to action. Effective alert management helps you detect problems quickly, maintain resilience, and build customer trust. So, what is the best way to manage alerts when storing metrics in Amazon Managed Service for Prometheus? In this blog post, you will learn how to create, route, and administrate alerting rules in Amazon […]
Search and discover governance controls with Control Catalog in AWS Control Tower
As you scale your AWS environment from hundreds to thousands of AWS accounts, maintaining consistent governance standards across this expanded infrastructure requires a strategic approach. Governance controls—the automated policies and rules that enforce standards across your cloud infrastructure—are essential for managing this scale, but implementing them presents two fundamental challenges. First, without proper controls, a […]
Resolve application issues autonomously with AWS DevOps Agent (Preview) and Dynatrace
Application issues require fast resolution to maintain business continuity and customer satisfaction, but manual investigation creates delays that can cost organizations significantly in lost revenue and productivity. Last week, we launched AWS DevOps Agent (Preview), a frontier agent that resolves and proactively prevents incidents, continuously improving reliability and performance of applications in AWS, multicloud, and […]
Troubleshoot AWS Tagging Compliance with AWS Resource Explorer
With AWS Resource Explorer’s immediate resource discovery launch on October 13, 2025, customers can now discover resources from their very first search in Unified Search in the AWS Management Console or the Resource Explorer console. Operations like troubleshooting and problem resolution, making resource changes, investigating resource dependencies, identifying security risks, and optimizing costs are critical […]
Amazon CloudWatch RUM now supports mobile application monitoring
Amazon CloudWatch RUM now supports iOS and Android applications, expanding real user monitoring beyond web applications. Developers and SREs can now quickly isolate mobile application issues and improve end-user experience, with visibility into performance metrics such as screen load times, crash rates, and API latencies.
Prometheus MCP Server: AI-Driven Monitoring Intelligence for AWS Users
We recently launched the open source Prometheus Model Context Protocol (MCP) server for Amazon Managed Service for Prometheus. This new capability enables artificial intelligence (AI) code assistants such as Amazon Q Developer CLI, Cline, and Cursor to interact with your Prometheus monitoring infrastructure through natural language queries. The MCP server provides AI assistants with contextual […]
2025 Top 10 Announcements for AWS Cloud Operations
At AWS re:Invent 2025, we’re excited to share latest innovations designed to empower organizations to thrive in the transformative AI era. This year’s top Cloud Operations announcements address the most pressing challenges our customers face today—from gaining comprehensive visibility into generative AI workloads to significantly accelerating incident resolution and efficiently managing the exponential growth of […]








