AWS Cloud Operations Blog
Category: Amazon CloudWatch
How The Washington Post’s Arc XP uses CloudWatch Metrics Explorer to reduce costs
In this post, it is described how The Washington Post’s Arc XP uses Metrics Explorer to monitor their global SaaS platform and reduce costs
Using Amazon CloudWatch with Amazon EventBridge for cross-account event monitoring
We often talk about event driven architectures where an event is something that happens within your application or architecture. It could be a new file received by your application or when there is an alert triggered by high CPU utilization. We can act on these events by scanning the file contents or scaling out more […]
Collecting Apache Flink metrics in the Amazon CloudWatch agent
Apache Flink is a distributed stream processing engine. You can run Flink on Amazon EMR as a YARN application. You can view Flink metrics through its web UI, but what if you want to react to them? In this blog post, I’ll show you how to use the CloudWatch agent to collect Flink metrics into […]
Use AWS CloudWatch Contributor Insights to monitor CIS AWS Foundations Benchmark controls
Contributor Insights is a feature of AWS CloudWatch that can be used to analyze log data to create time series that displays contributor data. This will help you understand who or what is impacting your system and application performance by identifying top talkers, pinpointing outliers, finding the heaviest traffic patterns, and ranking the top system […]
Introducing CloudWatch Resource Health to monitor your EC2 hosts
Today, AWS announced Amazon CloudWatch Resource Health, a fully managed solution that customers can use to automatically discover, manage, and visualize the health and performance of Amazon Elastic Compute Cloud (Amazon EC2) hosts across their applications. Resource Health provides a centralized view of your EC2 hosts by performance dimensions such as CPU or memory utilization. […]
Monitoring your EC2 server fleet with advanced CloudWatch agent capabilities
Customers who are running fleets of Amazon Elastic Compute Cloud (Amazon EC2) instances use advanced monitoring techniques to observe their operational performance. Capabilities like aggregated and custom dimensions help customers categorize and customize their metrics across server fleets for fast and efficient decision making. Customers need visibility not only into infrastructure metrics (like CPU and […]
Reinventing automated operations (Part II)
The first post in this series, Reinventing automated operations (Part I), covered the importance of operations in the cloud and how deferring the creation of an operations plan can slow down your migration. In this post, I’ll share the primary mechanism of iterative improvement (aka flywheel) that AWS Managed Services (AMS) uses to increase operational […]
Detecting and remediating process issues on EC2 instances using Amazon CloudWatch and AWS Systems Manager
Customers want to have visibility into processes running inside their Amazon Elastic Compute Cloud (Amazon EC2) instances. Critical processes and services in these instances can crash unexpectedly and when they do, it’s crucial for customers to be notified so they can maintain continued business operations. There are multiple ways to see if a service is […]
Use AWS Control Tower lifecycle events to automate configuration of AWS accounts for ServiceNow IT operations management
Several organizations that I work with use ServiceNow’s IT Operations management capabilities for their on-premises infrastructure and want to leverage the same capabilities for their AWS environment as well. Some of the core capabilities of ServiceNow’s IT Operations management are ServiceNow Discovery, Event Management and Cloud Management. Currently, customers who want to enable ServiceNow’s Cloud […]
Delete Amazon CloudWatch Synthetics dependent resources when you delete a CloudFormation stack
Amazon CloudWatch Synthetics allows you to monitor application endpoints more easily. It runs tests on your endpoints every minute, and alerts you if your application endpoints don’t behave as expected. These tests can be customized to check for availability, latency, transactions, broken or dead links, page load errors, load latencies for UI assets, complex wizard […]




