Best Practice
Establish a formal incident response plan that describes actions to be taken when a security incident is detected and reported.
AWS Implementation
AWS has implemented a formal, documented incident response policy and program. The policy addresses purpose, scope, roles, responsibilities, and management commitment.
AWS utilizes a three-phased approach to manage incidents:
1. Activation and Notification Phase: Incidents for AWS begin with the detection of an event. This can come from several sources, such as:
Metrics and alarms - AWS maintains an exceptional situational awareness capability, most issues are rapidly detected from 24x7x365 monitoring and alarming of real time metrics and service dashboards. The majority of incidents are detected in this manner. AWS utilizes early indicator alarms to proactively identify issues that may ultimately impact Customers.
Trouble ticket entered by an AWS employee.
Calls to the 24x7x365 technical support hotline.
If the event meets incident criteria, then the relevant on-call support engineer will start an engagement using AWS' event management tools to start the engagement and page relevant program resolvers. The resolvers will perform an analysis of the incident to determine if additional resolvers should be engaged and to determine the approximate root cause.
2. Recovery Phase - the relevant resolvers will perform break fix to address the incident. Once troubleshooting, break fix and affected components are addressed, the call leader will assign next steps in terms of follow-up documentation and follow-up actions and end the call engagement.
3. Reconstitution Phase - Once the relevant fix activities are complete the call leader will declare that the recovery phase is complete. Post mortem and deep root cause analysis of the incident will be assigned to the relevant team. The results of the post mortem will be reviewed by relevant senior management and relevant actions such as design changes etc. will be captured in a Correction of Errors (COE) document and tracked to completion.
In addition to the internal communication mechanisms detailed above, AWS has also implemented various methods of external communication to support its customer base and community. Mechanisms are in place to allow the customer support team to be notified of operational issues that impact the customer experience. A "Service Health Dashboard" is available and maintained by the customer support team to alert customers to any issues that may be of broad impact.
AWS incident management program reviewed by independent external auditors during audits for our SOC, PCI DSS, ISO 27001 and FedRAMP compliance.
Workflow documentation of Content (data) is the responsibility of AWS Customers as Customers retain ownership and control of their own guest operating systems, software, applications and data.