Intelligent Operations

Accelerate operational investigations and remediation

Overview

Use the extensive operational experience that AWS has accumulated and refined over 17 years of delivering cloud services to millions of customers around the world. We've applied AI and machine learning (ML) to help enhance, accelerate, and automate your cloud operations processes. Intelligent operations allows you to easily observe your applications and infrastructure components, accelerate operational investigations and troubleshooting, and take actions to resolve and remediate operational issues, improving mean time to recovery (MTTR).

Benefits

AWS has more operational experience and scale than any other cloud provider, delivering cloud services to millions of customers around the world for over 17 years. We built this experience into Amazon CloudWatch capabilities to help guide you through troubleshooting and remediation, so you can complete operational investigations across your AWS environment in just a fraction of the time.

Start an operational investigation from anywhere in the AWS Management Console. You can configure CloudWatch to begin an investigation as soon as an alarm goes off, or create an investigation from an Amazon Q chat. CloudWatch works alongside you in the investigation, helping you identify anomalies in your applications and drive hypotheses into the root cause of issues.

CloudWatch suggests remediation actions for common AWS issues by surfacing relevant AWS Systems Manager Automation runbooks, AWS re:Post articles, and documentation. You can fill in key parameters and review runbook contents prior to running the runbook to resolve the issue.

CloudWatch works alongside you throughout your troubleshooting journey from issue triage through remediation, saving you time in finding root cause. CloudWatch automatically adds context to your observability data, allowing operators of all experience levels to expertly navigate across telemetry and related resources.

 

Use cases

You can configure CloudWatch to automatically start investigating when a CloudWatch alarm goes off. By the time you’ve opened your laptop, CloudWatch is already sifting through your telemetry looking for anomalies. CloudWatch uses its knowledge of your AWS resources to discover the relationships between them and suggest possible root cause for the alarm, helping you get back into production faster than ever before.

Amazon CloudWatch adds context to observability data, transforming disparate metrics and logs into real-time insights. This feature is integrated across the AWS Management Console, accessible from multiple entry points, so you can easily navigate related telemetry and visualize relationships between resources, to accelerate analysis.

Quickly gather insights from your observability data without needing extensive knowledge of the query language. You can ask questions of your logs and metrics in plain English, and CloudWatch will generate the appropriate queries for you.