AWS Cloud Operations Blog

Tag: Cloud Operations

Creating a correction of errors document

This blog post will walk you through an example of creating a Correction of Errors (COE) document. At Amazon, operational excellence is in our DNA. One best practice that we have learned at Amazon is to have a standard mechanism for post-incident analysis. The COE process facilitates learning from an event to avoid reoccurrences in […]

Self-service Account Provisioning Using AWS Service Management Connector for ServiceNow

Many customers are looking to adopt a multi-account strategy within their AWS environment. This allows customers to isolate their workloads into different environments including test, dev, and production in addition to separating workloads based on regulatory requirements. As customers scale their multi-account environments, one strategy to increase agility is to offer business units their own […]

Using AWS AppConfig to Manage Multi-Tenant SaaS Configurations

Using AWS AppConfig to Manage Multi-Tenant SaaS Configurations

As a Software as a Service (SaaS) provider, you can benefit from a SaaS operating model in a number of ways. One of the most impactful benefits you can realize is improvements to your operational efficiency, and one of the fundamental techniques you can leverage is to maintain a single software version for all your […]

Automate insights for your EC2 fleets across AWS accounts and regions

Automate insights for your EC2 fleets across AWS accounts and regions

Introduction Gaining insights and managing large Amazon Elastic Compute Cloud (Amazon EC2) fleet that is spread across multiple accounts and regions can be a challenging task. It’s crucial to have a quick and efficient method to identify which instances are managed by AWS Systems Manager (SSM) and gather detailed information about the instances that are […]

Centralize AWS Cost Anomaly Detection using Amazon Managed Grafana

AWS Cost Anomaly Detection uses advanced Machine Learning to identify anomalous spend and root causes, empowering the customers to take action quickly. Currently, in order to view the AWS Cost Anomalies in AWS Cost Explorer, it requires the user to have IAM user access privileges on the AWS Management Console. The ability to centrally monitor and […]

Centralized Dashboard for AWS Config and AWS Security Hub

Back in July 2022, we announced AWS config compliance scores for conformance packs which helps you quantify your compliance posture as an Amazon CloudWatch metric. It’s a quantitative measure of compliance status. While customers can have hundreds of AWS accounts where AWS Config is enabled and each account and each AWS Region have a different compliance score. While […]

Using the Fault Tolerance Analyser Tool to Identify Potential Issues

Introduction Ensuring resilience, the ability for a system to recover from a failure induced by load, attacks, and other issues, is a shared responsibility that underpins the reliability of your workloads. While AWS provides the resilient underlying cloud infrastructure, customers are tasked with maintaining the resilience of their applications. In this landscape of joint responsibility, […]

Provision products and raise patch change requests in AWS via ServiceNow

ServiceNow is a popular cloud-based IT Service Management (ITSM) platform. Organizations use ServiceNow to manage incidents, track scheduled and planned infrastructure changes, manage new service requests and track configuration items across IT systems. Common questions I’ve had from customers include how they can use ServiceNow to provision new instances. Or, how to use ServiceNow to […]

Build AWS Systems Manager Automation runbooks using AWS CDK

AWS Systems Manager Automation runbooks let you deploy, configure, and manage AWS resources safely and at scale. You can use AWS-published runbooks or build your own to enable AWS resource management across multiple accounts and regions. The AWS Cloud Development Kit (AWS CDK v2) is an open-source framework that can build applications with the expressive power of […]

Visualizing Resources with Workload Discovery on AWS

Operations Teams (Ops Teams) across enterprises typically rely on documented architecture diagrams to understand the dependencies of various workloads deployed on AWS. As enterprises continue to deploy large-scale multi-tiered workloads, it can become challenging for Ops Teams to track the ever changing relationships between the deployed resources, often meaning that documentation can’t keep up with […]