AWS Cloud Operations Blog

Category: Resilience

Why you should develop a correction of error (COE)

Application reliability is critical. Service interruptions result in a negative customer experience, thereby reducing customer trust and business value. One best practice that we have learned at Amazon, is to have a standard mechanism for post-incident analysis. This lets us analyze a system after an incident in order to avoid reoccurrences in the future. These […]