AWS Cloud Operations Blog

Category: Advanced (300)

Log analysis with facets, correlation, enrichment, and automation in Amazon CloudWatch Log Analytics

Teams working with distributed applications accumulate logs across multiple log groups, including application logs, access logs, and audit trails. When something needs investigating, an engineer opens the console and starts writing queries from scratch. The same query gets written differently by different people. The results lack context because the log event does not contain who […]

From Monolith to Multi-Account: Pinterest’s AWS Organization Transformation Journey

Introduction Pinterest launched in 2009 with a mission to bring everyone the inspiration to create a life they love. As one of the early cloud pioneers, Pinterest grew to hundreds of thousands of resources and exabytes of data within a single AWS account well before most cloud-native organizations operated at that scale or the best […]

Shift-Left Tag Compliance using AWS Organizations and Terraform

In this post you will learn about AWS Organizations tag policies, the tag_policy_compliance Terraform provider setting, a reusable tagging module that automatically applies required tags, and a test-driven approach that dynamically validates against your organizational policies.

Automate AWS Systems Manager activation for hybrid-managed node registration

AWS Systems Manager (formerly known as SSM) is an AWS service that you can use to view and control your servers on AWS cloud and on-premises infrastructure. Systems Manager makes it easy to manage a hybrid environment. To set up servers and virtual machines (VMs) in your hybrid environment as Systems Manager managed instances, you […]

Resolve application issues autonomously with AWS DevOps Agent (Preview) and Dynatrace

Application issues require fast resolution to maintain business continuity and customer satisfaction, but manual investigation creates delays that can cost organizations significantly in lost revenue and productivity. Last week, we launched AWS DevOps Agent (Preview), a frontier agent that resolves and proactively prevents incidents, continuously improving reliability and performance of applications in AWS, multicloud, and […]

AWS Organizations launches account state information for granular account lifecycle management

AWS Organizations enables customers to centrally manage their AWS accounts. Since many customers prefer to automate the account creation process, they can leverage CreateAccount API, thereby creating an account vending pipeline. This pipeline standardizes the deployment of policies, roles, and resources across new accounts while managing the complete lifecycle through eventual account closure. Through this […]

Best practices for utilizing AWS Systems Manager with AWS Fault Injection Service

Introduction In today’s cloud-centric world, ensuring the resilience of mission-critical applications is paramount. The ability to withstand and recover from unexpected failures, including degradation of cloud provider services, can mean the difference between seamless operation and costly downtime. This is where the powerful combination of AWS Systems Manager (SSM) and AWS Fault Injection Service (AWS […]

Analyze Azure Audit Logs with CloudTrail Lake

Analyze Azure Audit Logs with CloudTrail Lake

Introduction In the ever-evolving world of cloud computing, maintaining robust security and compliance is paramount. As usage of multicloud environments grows, the need for comprehensive monitoring and logging solutions becomes more critical. Enter the synergy of Azure Audit Logs and AWS CloudTrail Lake—a powerful combination that provides comprehensive visibility across your cloud environments. Azure Audit […]

Monitor AWS Transit Gateway Flow Logs centrally using Amazon Managed Grafana

Monitor AWS Transit Gateway Flow Logs centrally using Amazon Managed Grafana

As organizations continue to expand their cloud infrastructure by connecting multiple Amazon Virtual Private Clouds (Amazon VPC) across accounts and regions, the complexity of managing their network environment increases. AWS Transit Gateway has emerged as a powerful solution to simplify this complexity by providing a centralized hub for secure communication between Amazon VPCs, on-premises systems, and […]

How Hapag-Lloyd automated incident management using AWS Step Functions

This post is co-authored by Grzegorz Kaczor and Daniel Steenbock from Hapag-Lloyd AG and Michael Graumann and Daniel Moser from AWS. Introduction In today’s fast-paced digital landscape, efficient incident management is crucial for maintaining high-quality customer experiences. In our previous article we discussed how the Web and Mobile department at Hapag-Lloyd established observability for serverless […]