AWS for SAP
Improve your SAP business process resilience with Amazon CloudWatch Application Insights
Introduction
One reason thousands of customers choose AWS to run their mission-critical SAP workloads is to improve reliability and business resilience. In fact, a 2023 IDC report found that customers who run S/4HANA on AWS experience 58% less unplanned downtime vs. on-premises.
Increasingly, SAP customers are choosing RISE with SAP on AWS to support their AWS transformation. RISE with SAP on AWS gives customers S/4HANA Cloud on the world’s most secure, reliable, and extensive cloud infrastructure, and the broadest set of services to transform their business.
However, thousands of customers still run SAP on AWS using traditional license models, and are seeking guidance on configuring their SAP systems for high availability while minimizing complexity.
As organizations rely on mission-critical systems like SAP to drive their core business processes, ensuring resilience and continuity have become paramount. According to an IDC report, unplanned application downtime costs the Fortune 1000 from $1.25 billion to $2.5 billion every year. This cost includes lost productivity, lost revenue, and potential damage to the company’s reputation. To reduce this risk, for mission critical SAP workloads, you will need to configure High Availability (HA) solutions that spans multiple Availability Zones (multi-AZs).
While the intention to introduce HA and multi-AZs is to minimize downtime, it also adds a layer of complexity to deploy and manage. Therefore, customers must have the appropriate level of observability of the health of the deployed HA solution, so they will know that it’s ready to handle system failures or other interruptions in service.
Observability of HA will enable your IT team to monitor, investigate, and troubleshoot issues holistically. It provides complete visibility across the distributed SAP environment, allowing for proactive detection of issues, simplifying the analysis of underlying causes, and optimizing the resolution process to minimize disruptions to critical operations.
In this blog, we will discuss the observability for AWS for SAP with High Availability (HA) architecture pattern with Amazon CloudWatch Application Insights (CWAI) for SAP.
Business Scenario
In traditional SAP environments, HA architectures can be complex to configure, deploy, and manage. As a SAP Basis administrator you require visibility to system availability, ability to troubleshoot issues, identify cause of the failure and implement robust observability. In order to improve the HA for SAP, and aligned with AWS Well Architected Framework and the SAP Lens of the AWS Well Architected Framework, you will implement observability for your SAP workload with a HA design pattern, to meet the following objectives:
- Incident management: quickly identify an availability issue, identify failed component, set alarms with notification, and restore business operations.
- Post-incident root-cause analysis: save time by using AWS machine learning algorithms to process the system for accurate root-cause analysis.
- Preventive and Corrective actions: observe SAP availability from single pane of glass, configure drift detection for the HA solutions, and set adaptive thresholds with alerts for near real-time notification to proactively identify anomaly before availability issue occur.
High Availability for SAP building blocks
When implementing HA, you need to be aware of the following key performance indicators as your primary objectives:
- The Mean Time to Recover (MTTR) which measures the average time needed to recover from failure and prevent it from recurring.
- The Recovery Time Objective (RTO) which is the time required for your SAP application to recover following an outage and defines the acceptable period of unavailability before you restore service.
- The Recovery Point Objective (RPO) determines the acceptable level of data loss during recovery for your application.
To learn more about RPO, RTO, and MTTR, please refer to the High Availability (HA) and Disaster Recovery (DR) section in the AWS documentation guide, ‘Architecture guidance for availability and reliability of SAP on AWS.
Once your business has agreed on the RPO, RTO, and MTTR, it is essential to protect the Single Point of Failure (SPOFs) in an SAP system to prevent availability problems. This includes safeguarding SAP Central Services, SAP Application Server, NFS, Database, and SAP Web Dispatcher.
We also recommend taking regular backups of Amazon Elastic Block Storage (EBS) volumes, Amazon EC2 images, Amazon Elastic File System (EFS), and database backups using tools like AWS Backup, which can also support database level backups of your SAP HANA database.
Let’s look how we can implement protection mechanism to achieve HA.
| SAP Single Point of Failure | Protection | |
| 1 | SAP ABAP Central Services (ASCS) and Enqueue Replication Server (ERS) | ASCS consists of Message Server which manages user logon and workload distribution across App Servers, and Enqueue Server which manages the application-level table locks (enqueue locks). You deploy ASCS host in the primary Availability Zone (AZ) and deploy the SAP Enqueue Replication Service (ERS) host in the secondary AZ. You will then protect these hosts using OS Clustering software. 1) | 
| 2 | SAP Database (including HANA) | To improve MTTR and RPO, you can install additional host running a standby copy of the HANA database in secondary AZ with HANA System Replication (HSR) enabled to protect against data loss and minimize downtime. You will then protect these hosts using OS Clustering software. 1) | 
| 3 | SAP Primary Application Server (PAS) | Deploy additional SAP Application Servers in balanced across multiple AZs to allow your user to reconnect and continue their business processes. | 
| 4 | SAP Web dispatcher | Configure an AWS Application Load Balancer (ALB) or Network Load Balancer (NLB), which is either internet facing or internal facing to front-end the Web Dispatcher traffic. The load balancer(s) serves as the single point of contact to SAP Web Dispatcher, are highly available, and automatically scale request handling capacity in response to incoming application traffic. | 
| 5 | Shared file systems (NFS/SMB) | Implement Amazon Elastic File System (EFS), Amazon FSx for Windows File Serve or Amazon FSx for NetApp ONTAP file system, as a highly available, and durable managed NFS service that runs actively across multiple AZs. | 
Note:
1) You can improve RTO of the ASCS ERS and SAP Database by implementing Pacemaker, an open-source HA cluster resource manager. Pacemaker manages the individual Amazon Elastic Compute Cloud (EC2) hosts (also called nodes) deployed in multiple AZs, and will detect failures and orchestrate fail-over activities to recover your workload by moving it from the primary node to the secondary, which is in a different AZ. You may also improve the availability of your SAP Application Servers, by deploying SAP Additional Application Servers (AAS) in the secondary AZ to allow your user to reconnect and continue their business processes.
See figure 1 reference architecture for an SAP system with HA setup.
 Figure-1: SAP HA deployment in two Availability Zones (AZ)
Figure-1: SAP HA deployment in two Availability Zones (AZ)
Improving observability of your HA solution
Amazon CloudWatch Application Insights (CWAI) for SAP allows for seamless onboarding of your SAP application through a wizard, giving you immediate access to pre-built dashboards, metrics, logs, traces, and alarms. Amazon CWAI for SAP streamlines SAP availability oversight for different organizational roles on a single platform. During the onboarding process, CWAI automatically discovers resources for SAP NetWeaver systems, such as Amazon EC2 instances, Amazon EBS volumes, and Amazon EFS file-systems. It also enables metrics and dashboard out of the box. To learn about how to onboard SAP application to CWAI, please refer to AWS blog on Monitor SAP Applications using Amazon CloudWatch Application Insights for more information.
Let’s take a look at how to configure observability for CWAI for SAP Single Point of Failure (SPOFs) such as: SAP Central Services (message server and enqueue processes), SAP Application Server, NFS (shared storage), Database, and SAP Web Dispatcher.
Step-1: Incident Management
Configure observability for SAP Central Services
CWAI for SAP can monitor SAP ASCS and ERS cluster nodes. It provides indication of ASCS failure resulting in fence actions taken to move ASCS to healthy node (ERS). You can also find out about network disruptions between SAP ASCS and ERS cluster nodes causing heartbeat failure.
 Figure-2: SAP HA metrics dashboard for SAP ASCS and ERS cluster
Figure-2: SAP HA metrics dashboard for SAP ASCS and ERS cluster
You can view the SAP HA metric sap_HA_get_failover_config_HAActive and sap_HA_check_failover_config_state, which indicates the configuration state of HA to ensure that it is ready to failover when any failures detected within the cluster.
 Figure-3: SAP HA metrics dashboard with Cluster Connector Metrics
Figure-3: SAP HA metrics dashboard with Cluster Connector Metrics
Configure observability for SAP application servers
You can configure CWAI to monitor availability for SAP NetWeaver Application (i.e. status of PAS and AAS). It highlights system level errors, system exception and processes status (i.e. message server, enqueue server, igw, icman, gwrd, disp+work) impacting availability of SAP systems.
 Figure-4: SAP NetWeaver Availability dashboard
Figure-4: SAP NetWeaver Availability dashboard
Configure observability for SAP database
CWAI for SAP can be configured to monitor HANA database availability (including HSR between primary and standby database nodes). It provides information related to network disruptions between database nodes that may cause failover events and failure to reconnect to secondary database node. The graphics are able to show abnormal spikes in database transactions causing HSR delays in log shipping.
 Figure-5: SAP HA metrics dashboard for SAP HANA database cluster
Figure-5: SAP HA metrics dashboard for SAP HANA database cluster
 Figure-6: SAP HSR Metric dashboard for SAP HANA SYSTEMDB and HDB
Figure-6: SAP HSR Metric dashboard for SAP HANA SYSTEMDB and HDB
Configure observability for SAP Web Dispatcher
Amazon CloudWatch supports monitoring Application Load Balancer (ALB) availability and EC2 instance availability. Learn more about Amazon CloudWatch ALB monitoring metrics and metrics to monitor status of EC2 instance.
Step-2: Post-Incident root-cause analysis
CWAI automatically ingests logs and traces from SAP servers. By analyzing cluster logs, NetWeaver logs and traces, CWAI can automatically identify anomalies and initiate automated responses. CWAI uses log data from SAP components to identify and resolve possible disruptions to SAP availability. These capabilities help you troubleshoot and resolve issues with your SAP applications, and reduce MTTR.
The Problem Summary dashboard is useful in detecting performance issues with AWS infrastructure, such as EC2 server sizing, EBS storage IOPS and throughout, EFS traffic, and network throughput. There are also common SAP problems reported related to availability, such as HSR failures, spike in database traffic, failed application servers, and more. Below in figure-7, we have illustrated problems automatically detected by CWAI with availability and performance of SAP systems in HA deployment.
 Figure-7: Problem Summary Dashboard
Figure-7: Problem Summary Dashboard
In this sample scenario, CWAI for SAP has automatically detected an ongoing issue with SAP availability. The dashboard provides a summary of problems, and insights into how to resolve the problems. To ensure SAP availability, it is advisable to activate machine-learning based anomaly detection for metrics and establish relevant CloudWatch alerts using CloudWatch Alarms. In this example scenario below, SAP PAS is reporting offline.
 Figure-8: SAP Availability problem
Figure-8: SAP Availability problem
The SAP availability metric sap_alerts_availability confirms SAP PAS status has changed from online to offline, and continues to be an issue causing error with the SAP system.
 Figure-9: SAP Availability problem is related to SAP PAS status
Figure-9: SAP Availability problem is related to SAP PAS status
For more information on troubleshooting your SAP system, refer to the Amazon CloudWatch documentation sections on troubleshooting SAP NetWeaver, SAP HANA, and SAP ASE. To learn more about anomaly detection, read the AWS blog Operationalizing CloudWatch Anomaly Detection.
Step-3: Corrective, and Preventive actions
To keep track of crucial metrics for SAP HA, we advise activating anomaly detection with adaptive thresholds and receiving alerts through Amazon SNS or Amazon EventBridge. To learn more about centralizing CloudWatch Alarms with AWS EventBridge and AWS CloudFormation, please refer to the AWS blog, “How to centralize CloudWatch Alarms with Amazon EventBridge and AWS CloudFormation”.
The SAP Basis administrator enables anomaly detection for sap_alerts_availability metric.
 Figure-10: Anomaly Detection enabled for SAP PAS
Figure-10: Anomaly Detection enabled for SAP PAS
Estimated Cost for CloudWatch Application Insights for SAP
CloudWatch Application Insights sets up CloudWatch custom metrics, alarms, and logs for SAP when onboarding the application. Charges will be incurred according to Amazon CloudWatch pricing. Let’s examine a sample scenario of an SAP system with HA deployment that includes 10 Additional Application Servers (AAS), 1 Primary Application Server (PAS), 1 SAP ASCS and ERS cluster, and a HANA Database cluster with HANA System Replication (HSR). In our sample system, we have total of 15 EC2 servers and associated EBS volumes with EFS storage for SAP file shares.
| Unit Cost | Features enabled | Total Cost | |
| Cost per custom metric | $0.30 / month | 10 metrics | $3.00 / month | 
| Cost per alarm | $0.10 / month | 10 alarms | $1.00 / month | 
| Data Ingestion cost | $0.05 / GB | 10 GB / month | $0.50 / month | 
| SAP Log Storage cost | $0.03 / GB | 10 GB / month | $0.30 / month | 
| Total cost for 1 EC2 instances | $4.80 / month | ||
| Total cost for SAP system with HA deployment (running on 15 EC2 instances) | $72.00 / month | ||
For more information on AWS CloudWatch Application Insights pricing, please refer to Pricing section in the CWAI documentation.
Conclusion
To minimize disruption to a SAP system, you should ensure multiple layers of coverage, including observability of key components of the system. With Amazon CloudWatch Application Insights for SAP, you can enable full-stack observability for your SAP HA deployment in minutes and start using out-of-the-box features such as pre-built dashboards, pre-configured metrics and alarms, automated anomaly detection, and proactively manage SAP availability.
CWAI can create easy to use, “single pane of glass” type of dashboards for SAP, that can be shared across your organization for your whole team to track the health of your SAP HA solutions. It contains metrics for end-to-end monitoring for SAP system with HA deployment. CloudWatch Dashboards can be viewed and accessed by users outside of your AWS account. To ensure SAP availability, we advise that SAP Basis administrators collaborate with the operations team to maintain a collection of vital SAP HA metrics in a dedicated dashboard.
CWAI is available for SAP ASE, SAP NetWeaver and SAP HANA running on Red Hat Enterprise Linux, and SUSE Linux Enterprise Server operating systems. You can find more guidance on how to use this solution, including tutorials, in the Amazon CloudWatch Application Insights documentation, and hands on guidance to implement it through the workshop on “Setup monitoring for SAP using Amazon CloudWatch Application Insights. To learn more about observability for SAP workloads on AWS, read the AWS blog series for “End-to-End Observability for SAP on AWS”, part 1 and part 2.
Join the AWS for SAP Discussion
In addition to your customer account team and AWS Support channels, AWS provides public question and answer forums on our re:Post Site. Our AWS for SAP team regularly monitor the AWS for SAP topic for discussion and questions that could be answered to assist you. If your question is not support-related, consider joining the discussion over at re:Post and adding to the community knowledge base, by contributing your own questions and answers.
TAGS: #saponaws #awsforsap #cloudwatch #monitoring #observability #SAPHANA #SAPNetWeaver #Pacemaker #HighAvailability