AWS Cloud Operations Blog

Category: Technical How-to

Blog Post title image

Simulating partial failures with AWS Fault Injection Service

Modern distributed systems must be resilient to unexpected disruptions to maintain availability, performance, and stability. Chaos engineering helps teams uncover hidden weaknesses by deliberately injecting faults into a system and observing how it recovers. While traditional testing validates expected behavior, chaos engineering tests system resilience during failures. AWS Fault Injection Service (AWS FIS) is a […]

Observing Agentic AI workloads using Amazon CloudWatch

Introduction As the adoption of agentic AI applications continues to grow, ensuring the reliability, performance, and overall observability of these systems becomes increasingly critical. Agentic AI applications, powered by large language models (LLM) and integrated with various data sources and APIs, can quickly become complex, making it challenging to gain visibility into their inner workings […]

Best practices for utilizing AWS Systems Manager with AWS Fault Injection Service

Introduction In today’s cloud-centric world, ensuring the resilience of mission-critical applications is paramount. The ability to withstand and recover from unexpected failures, including degradation of cloud provider services, can mean the difference between seamless operation and costly downtime. This is where the powerful combination of AWS Systems Manager (SSM) and AWS Fault Injection Service (AWS […]

Blog Featured Image

New: AWS CloudTrail Lake Event Enrichment: Add Business Context to AWS Activity Logs

AWS customers use AWS CloudTrail Lake to aggregate and analyze their AWS activity for security, operational troubleshooting, and compliance purposes. However, when investigating security incidents or conducting compliance audits, customers often need additional business context beyond the basic event details – like which team or project owns the affected resources, or what where the properties […]

Visualizing Amazon DynamoDB data with Amazon OpenSearch Service and Amazon Managed Grafana

Visualizing Amazon DynamoDB data with Amazon OpenSearch Service and Amazon Managed Grafana

High-performance applications with unlimited throughput capabilities pose significant monitoring challenges, especially when tracking real-time metrics, utilization, and throttling events across distributed database workloads. Near real-time visibility into metrics is crucial for application performance and cost optimization. AWS allows you to seamlessly integrate multiple services to tackle these operational complexities. With Amazon DynamoDB, you can build […]

Gain Compliance Insights in your AWS Environment Using Amazon Q Business

Gain Compliance Insights in your AWS Environment Using Amazon Q Business

Enterprise organizations managing multiple AWS accounts face complexity as their cloud infrastructure scales. The exponential growth in resources, coupled with diverse configuration requirements across different business units, creates significant challenges in maintaining effective oversight of AWS environments. AWS Config is a service that continually assesses, audits, and evaluates the configurations and relationships of your resources […]

Maximizing Multi-Region Resilience with AWS Resilience Hub

In today’s fast-paced digital world, business continuity isn’t just a goal — it’s an achievable reality. As organizations continue to innovate and grow, their cloud-based applications have become the beating heart of modern business operations, delivering value to customers around the clock. Companies are taking their cloud strategy to the next level by embracing multi-Region […]

Key Governance, Risk, and Compliance Sessions at re:Inforce 2025

Key Governance, Risk, and Compliance Sessions at re:Inforce 2025

We are incredibly excited to see you at AWS re:Inforce, in Philadelphia, Pennsylvania, on June 16-18, 2025. This year’s Governance, Risk, and Compliance track features sessions on automating compliance, enhancing risk visibility, using generative AI for business growth, and maintaining security at scale, including 5 breakout sessions, 8 builder sessions, 7 chalk talks, 2 code […]

Automate registering Windows managed nodes with AWS Systems Manager

Automate registering Windows managed nodes with AWS Systems Manager

Managing hybrid infrastructure across AWS and on-premises environments presents a layer of operational complexity for managing nodes. Some teams use different tools to manage these systems based on the platform they are running on, while others use licensed Remote Monitoring and Management (RMM) software. Teams can use AWS Systems Manager hybrid activations to manage on-premise […]

Build Golden Images with CIS Linux Build Kit within Amazon EC2 Image Builder

Build Golden Images with CIS Linux Build Kit within Amazon EC2 Image Builder

The build and rollout of hardened and certified operating systems (OS) is an imperative of any Cloud Operations (CloudOps) or Cloud Center of Excellence (CCoE) team within an organization. The guideline and security controls to certify the images come from the respective teams within your organization who, in turn, refer to the popular industry wide […]