AWS Cloud Operations Blog
Category: Management & Governance
Build a Multi Account Patch Compliance Dashboard with Kiro Specs
Introduction Robust patch management is essential for maintaining system security, reliability, and compliance across your IT infrastructure. AWS Systems Manager Patch Manager provides a full-featured patching solution, enabling you to automate the deployment of operating system updates to managed nodes across AWS accounts, on-premises, and multicloud environments. However, as your organization scales across dozens or […]
From Monolith to Multi-Account: Pinterest’s AWS Organization Transformation Journey
Introduction Pinterest launched in 2009 with a mission to bring everyone the inspiration to create a life they love. As one of the early cloud pioneers, Pinterest grew to hundreds of thousands of resources and exabytes of data within a single AWS account well before most cloud-native organizations operated at that scale or the best […]
How Honeycomb improved resilience using AWS Fault Injection Service
Building resilience within cloud workloads is an important goal for ISVs to prevent application downtime, increase system reliability, and build customer trust. Honeycomb.io is a fast and collaborative observability platform for software developers and engineering teams to understand and troubleshoot their cloud-native applications. Honeycomb gives you the rich context at sub-second query speeds and AI-assisted […]
Import Historical data from AWS CloudTrail Lake to Amazon CloudWatch
Organizations managing workloads on AWS rely on AWS CloudTrail to answer the fundamental questions: Who did what, where, and when? Since January 2022, customers have stored their CloudTrail activity logs in CloudTrail Lake, a managed data lake purpose-built for capturing, storing, querying user and API activity across their AWS environment. As organizations scale across multiple […]
Shift-Left Tag Compliance using AWS Organizations and Terraform
In this post you will learn about AWS Organizations tag policies, the tag_policy_compliance Terraform provider setting, a reusable tagging module that automatically applies required tags, and a test-driven approach that dynamically validates against your organizational policies.
Simplifying Prometheus metrics collection across your AWS infrastructure
If you’re running services such as Amazon EC2 instances, Amazon Elastic Container Service (Amazon ECS) containers, and Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters in AWS, maintaining separate Prometheus servers for each environment creates significant operational burden. Managing scraper configurations, high availability, scaling, and security distracts you from building great applications. AWS managed […]
AWS Unified Operations: Building Resilient Operations for Mission-Critical Workloads
Achieve Mission-Critical Resiliency at Scale with AWS Unified Operations – The Top Tier of AWS Support to Achieve High Availability, Faster Migrations, and Accelerated Incident Resolution The Shift-Left Paradigm: From Reactive Firefighting to Proactive Prevention Organizations running mission-critical workloads face three critical operational gaps that undermine resilience and slow cloud adoption. Skills gaps make cloud-native […]
Essential security controls to prevent unauthorized account removal in AWS Organizations
When AWS member accounts are compromised, attackers can remove them from your organization, disabling all governance controls. In this post, you’ll learn how to protect your AWS environment from account compromise leaving your AWS Organization using layered security controls, including service control policies, secure account migration, and centralized root access management. AWS secures the infrastructure […]
Adaptive sampling with AWS X-Ray to capture critical spans
Introduction Enterprise applications using AWS X-Ray generate large volumes of distributed tracing data across multiple services. Static sampling strategies keep costs down by capturing a fixed percentage of traffic. However, they frequently miss critical data during intermittent failures or sudden latency spikes. Tracing every request for maximum visibility at scale may increase sampling costs for […]
Automate AWS Systems Manager activation for hybrid-managed node registration
AWS Systems Manager (formerly known as SSM) is an AWS service that you can use to view and control your servers on AWS cloud and on-premises infrastructure. Systems Manager makes it easy to manage a hybrid environment. To set up servers and virtual machines (VMs) in your hybrid environment as Systems Manager managed instances, you […]








