AWS for Industries
Enhance monitoring and observability for AWS HealthOmics workflows
Accessing workflow run logs and metrics through individual Amazon Web Services (AWS) HealthOmics APIs can make it difficult to have a birds-eye view over all your workflows and runs. Automating the ingestion of relevant HealthOmics workflow and run information can help you to be operationally efficient.
Introduction
Omics data has the potential to transform how Life Sciences researchers and organizations identify and treat disease. To derive useful insights from the data, effective strategies to manage, analyze, and interpret the high volume of omics data are needed.
AWS HealthOmics is a purpose-built service that helps you store, query, and analyze genomic, transcriptomic, and other omics data to support large-scale analysis and collaborative research. HealthOmics workflows can streamline bioinformatics workflows at scale by abstracting the infrastructure—helping you focus on science. The most common users of this service include bioinformaticians, data scientists, researchers, and developers.
HealthOmics offers two types of workflows:
- Private workflows: Custom workflows facilitate the ability to bring your own bioinformatics scripts written in certain supported workflow languages.
- Ready2Run workflows: Prebuilt pipelines, based on common industry analyses, facilitate quickly starting without writing code.
As workflow runs start to scale in volume and complexity (to the point where manual operations are time consuming) you will need quick and effortless visibility into many aspects of your workflows. These aspects include: cost and resource utilization, workflow optimization opportunities, failure notifications, as well as metrics like run volumes, user-based usage, failure type classification, and more.
We will demonstrate how to automate the ingestion of relevant HealthOmics workflow information and how this data can be used to build operational dashboards. These dashboards can be customized to meet your organizational requirements—enhancing monitoring and observability of HealthOmics workflows.
Solution overview
AWS HealthOmics provides events and metrics through its integration with Amazon CloudWatch logs. These logs contain discrete run level items that have all the required information, which can be transformed and prepared for summarization and visualization. Ingesting this data into a data lake provides the data foundation for using visualization tools to surface custom metrics, key performance indicators, and reports in easy-to-understand visuals.
The solution includes an event-driven notification system that can notify you of run failures as soon as they occur. The entire solution is available, through sample code, in our open-source repository on GitHub.
Figure 1: Reference architecture of the AWS HealthOmics workflows enhanced monitoring solution
This solution uses an event-driven architecture with a data lake to support the automation and data foundation for all relevant run metrics and metadata that needs to be visualized. The solution provides updates from workflow runs to the dashboard at a predefined schedule. HealthOmics manages multiple sources of information that we make available to a data lake, which can then have custom dashboards built to query and analyze it.
Solution overview:
1. Users (or automated systems) launch workflow runs on AWS HealthOmics. HealthOmics emits events to Amazon EventBridge, which we use to enable automation and integration with downstream services. We use run status events to capture data from the following data sources:
- HealthOmics workflow run status event: We specifically capture run status change events to keep track of all runs and their status in the data lake.
- HealthOmics manifest logs: HealthOmics publishes manifest logs to Amazon CloudWatch for each run. The logs provide high level information about each run task (such as task status, start time, stop time, and fail reason (if the task failed)). Run manifest logs also report resource utilization statistics that can be helpful for identifying resource optimization opportunities.
- HealthOmics Run Analyzer output: Run Analyzer is a standalone open-source tool that parses manifest logs and uses other data sources and logic to provide useful insights. You can use some of the Run Analyzer outputs in your dashboards to monitor performance and cost of runs.
- HealthOmics workflows: Periodically ingest all HealthOmics workflows, and their version information, into the data lake so we can keep track of all available workflows. Then we can augment run level information with useful workflow metadata.
2. The solution uses individual AWS Lambda functions to process each of the data sources mentioned previously and transforms the data into CSV or JSON formats. The Lambda function uploads the transformed CSV and JSON files, based on their respective prefixes, to a dedicated data lake in an Amazon Simple Storage Service (Amazon S3) location.
3. The solution creates an AWS Glue Data Catalog, which will house the necessary tables with query-able data from the data sources. An AWS Glue crawler monitors the S3 bucket location and runs on a configurable schedule (for example, run every 15 minutes). The crawler recognizes the file format and schema and populates the tables in the AWS Glue Data Catalog with new or changed data.
4. You can directly interact with the tables using Amazon Athena to inspect the data, build custom views, and experiment with queries that can power the dashboard. We recommend using AWS Lake Formation to manage access to these tables.
5. The solution includes instructions on how to use Amazon QuickSight to build observability dashboards to visualize the data and metrics that are important to you. You can customize your dashboards based on your organizational priorities.
6. Optionally, you can also use Amazon SageMaker notebook instances with this data if you want to use custom libraries or perform advanced interactive analysis.
7. In addition to dashboards, the solution also creates an Amazon Simple Notification Service (Amazon SNS) topic that you can subscribe to, to receive workflow run failure notifications.
The solution uses AWS Cloud Development Kit (AWS CDK) to deploy all the resources to your AWS account. This makes it easy for you to quickly make and deploy changes as needed.
The solution does not create QuickSight visualizations and dashboards automatically. However, instructions are provided on how to create them through some given examples. The GitHub repository provides the step-by-step process on how to deploy the solution in your AWS account and create some example visualizations and dashboards in QuickSight.
Figure 2: Reasons for run failures visualization
You can build multiple visuals with the available metrics in the data lake and present them together in a customized dashboard tailored to specific personas. These dashboards are initially private to the owner and can then be published and shared with other users in the account.
Benefits
This solution addresses several challenges and business needs. Following are a few monitoring and observability benefits:
1. Improve performance and turnaround time: With enhanced monitoring, you can quickly react to workflow run failures. When a workflow run encounters an issue, notifications alert you to promptly diagnose and restart workflows, thus reducing turnaround time. The solution provides insights into resource utilization, enabling workflow resource requirement adjustments to improve workflow turnaround time.
2. Root cause analysis: Transform troubleshooting from a reactive to a proactive approach. By surfacing the top reasons for workflow failures, you can systematically address recurring issues. A visualization might reveal that specific tool versions or computational environments consistently lead to failures, allowing you to standardize their approach and minimize future disruptions. This data-driven approach means you spend less time debugging and more time advancing scientific research.
3. Cost optimization: Cost management is intelligent and nuanced with this monitoring solution. Rather than applying broad-brush cost-cutting measures, researchers can now make surgical decisions about resource allocation. The system identifies tasks with underutilized resources, providing actionable recommendations. This approach verifies that cost optimization doesn’t come at the expense of research quality, striking a delicate balance between financial prudence and computational effectiveness.
4. Administrative tracking: The solution empowers you to have high resolution visibility into your computational workflows. Key metrics transform abstract workflow data into meaningful insights. You can track metrics (like number of runs for each workflow, user-specific usage patterns, and workflow run statuses) across different research projects. By encouraging comprehensive tagging of workflow runs, you can create a rich, query-able dataset that supports strategic decision-making.
5. Streamlined benchmarking: You can conduct sophisticated comparative analyses with ease. By creating workflow variations—experimenting with different tool versions, computational resources, and parallelization strategies—you can build custom visualizations that compare costs and runtime performance. This approach supports continuous improvement, so you can iteratively refine your computational strategies.
6. Scalability: As research projects grow in complexity and scale, understanding computational limits becomes crucial. The monitoring solution provides visibility into AWS HealthOmics quota limits and potential bottlenecks. You can then proactively engage with AWS support to request appropriate quota increases. This forward-looking approach confirms that computational infrastructure evolves alongside research ambitions.
Conclusion
We overviewed how you can set up a monitoring and observability solution to gain insights into your AWS HealthOmics workflow runs. This solution provides you automated failure notification emails about failed AWS HealthOmics runs. It also facilitates better visibility and insights into operational metrics, with enhanced governance over your organization’s use of HealthOmics.
Contact an AWS Representative to know how we can help accelerate your business.