AWS for Industries

Enhance monitoring and observability for AWS HealthOmics workflows

Accessing workflow run logs and metrics through individual Amazon Web Services (AWS) HealthOmics APIs can make it difficult to have a birds-eye view over all your workflows and runs. Automating the ingestion of relevant HealthOmics workflow and run information can help you to be operationally efficient.

Introduction

Omics data has the potential to transform how Life Sciences researchers and organizations identify and treat disease. To derive useful insights from the data, effective strategies to manage, analyze, and interpret the high volume of omics data are needed.

AWS HealthOmics is a purpose-built service that helps you store, query, and analyze genomic, transcriptomic, and other omics data to support large-scale analysis and collaborative research. HealthOmics workflows can streamline bioinformatics workflows at scale by abstracting the infrastructure—helping you focus on science. The most common users of this service include bioinformaticians, data scientists, researchers, and developers.

HealthOmics offers two types of workflows:

  1. Private workflows: Custom workflows facilitate the ability to bring your own bioinformatics scripts written in certain supported workflow languages.
  2. Ready2Run workflows: Prebuilt pipelines, based on common industry analyses, facilitate quickly starting without writing code.

As workflow runs start to scale in volume and complexity (to the point where manual operations are time consuming) you will need quick and effortless visibility into many aspects of your workflows. These aspects include: cost and resource utilization, workflow optimization opportunities, failure notifications, as well as metrics like run volumes, user-based usage, failure type classification, and more.

We will demonstrate how to automate the ingestion of relevant HealthOmics workflow information and how this data can be used to build operational dashboards. These dashboards can be customized to meet your organizational requirements—enhancing monitoring and observability of HealthOmics workflows.

Solution overview

AWS HealthOmics provides events and metrics through its integration with Amazon CloudWatch logs. These logs contain discrete run level items that have all the required information, which can be transformed and prepared for summarization and visualization. Ingesting this data into a data lake provides the data foundation for using visualization tools to surface custom metrics, key performance indicators, and reports in easy-to-understand visuals.

The solution includes an event-driven notification system that can notify you of run failures as soon as they occur. The entire solution is available, through sample code, in our open-source repository on GitHub.

Figure 1: Reference architecture of the AWS HealthOmics workflows enhanced monitoring solution

Figure 1: Reference architecture of the AWS HealthOmics workflows enhanced monitoring solution

This solution uses an event-driven architecture with a data lake to support the automation and data foundation for all relevant run metrics and metadata that needs to be visualized. The solution provides updates from workflow runs to the dashboard at a predefined schedule. HealthOmics manages multiple sources of information that we make available to a data lake, which can then have custom dashboards built to query and analyze it.

Solution overview:

1. Users (or automated systems) launch workflow runs on AWS HealthOmics. HealthOmics emits events to Amazon EventBridge, which we use to enable automation and integration with downstream services. We use run status events to capture data from the following data sources:

  • HealthOmics workflow run status event: We specifically capture run status change events to keep track of all runs and their status in the data lake.
  • HealthOmics manifest logs: HealthOmics publishes manifest logs to Amazon CloudWatch for each run. The logs provide high level information about each run task (such as task status, start time, stop time, and fail reason (if the task failed)). Run manifest logs also report resource utilization statistics that can be helpful for identifying resource optimization opportunities.
  • HealthOmics Run Analyzer output: Run Analyzer is a standalone open-source tool that parses manifest logs and uses other data sources and logic to provide useful insights. You can use some of the Run Analyzer outputs in your dashboards to monitor performance and cost of runs.
  • HealthOmics workflows: Periodically ingest all HealthOmics workflows, and their version information, into the data lake so we can keep track of all available workflows. Then we can augment run level information with useful workflow metadata.

2. The solution uses individual AWS Lambda functions to process each of the data sources mentioned previously and transforms the data into CSV or JSON formats. The Lambda function uploads the transformed CSV and JSON files, based on their respective prefixes, to a dedicated data lake in an Amazon Simple Storage Service (Amazon S3) location.

3. The solution creates an AWS Glue Data Catalog, which will house the necessary tables with query-able data from the data sources. An AWS Glue crawler monitors the S3 bucket location and runs on a configurable schedule (for example, run every 15 minutes). The crawler recognizes the file format and schema and populates the tables in the AWS Glue Data Catalog with new or changed data.

4. You can directly interact with the tables using Amazon Athena to inspect the data, build custom views, and experiment with queries that can power the dashboard. We recommend using AWS Lake Formation to manage access to these tables.

5. The solution includes instructions on how to use Amazon QuickSight to build observability dashboards to visualize the data and metrics that are important to you. You can customize your dashboards based on your organizational priorities.

6. Optionally, you can also use Amazon SageMaker notebook instances with this data if you want to use custom libraries or perform advanced interactive analysis.

7. In addition to dashboards, the solution also creates an Amazon Simple Notification Service (Amazon SNS) topic that you can subscribe to, to receive workflow run failure notifications.

The solution uses AWS Cloud Development Kit (AWS CDK) to deploy all the resources to your AWS account. This makes it easy for you to quickly make and deploy changes as needed.

The solution does not create QuickSight visualizations and dashboards automatically. However, instructions are provided on how to create them through some given examples. The GitHub repository provides the step-by-step process on how to deploy the solution in your AWS account and create some example visualizations and dashboards in QuickSight.

Figure 2 Reasons for run failures visualizationFigure 2: Reasons for run failures visualization

Figure 3: Runs by workflowFigure 3: Runs by workflow

You can build multiple visuals with the available metrics in the data lake and present them together in a customized dashboard tailored to specific personas. These dashboards are initially private to the owner and can then be published and shared with other users in the account.

Benefits

This solution addresses several challenges and business needs. Following are a few monitoring and observability benefits:

1. Improve performance and turnaround time: With enhanced monitoring, you can quickly react to workflow run failures. When a workflow run encounters an issue, notifications alert you to promptly diagnose and restart workflows, thus reducing turnaround time. The solution provides insights into resource utilization, enabling workflow resource requirement adjustments to improve workflow turnaround time.

2. Root cause analysis: Transform troubleshooting from a reactive to a proactive approach. By surfacing the top reasons for workflow failures, you can systematically address recurring issues. A visualization might reveal that specific tool versions or computational environments consistently lead to failures, allowing you to standardize their approach and minimize future disruptions. This data-driven approach means you spend less time debugging and more time advancing scientific research.

3. Cost optimization: Cost management is intelligent and nuanced with this monitoring solution. Rather than applying broad-brush cost-cutting measures, researchers can now make surgical decisions about resource allocation. The system identifies tasks with underutilized resources, providing actionable recommendations. This approach verifies that cost optimization doesn’t come at the expense of research quality, striking a delicate balance between financial prudence and computational effectiveness.

4. Administrative tracking: The solution empowers you to have high resolution visibility into your computational workflows. Key metrics transform abstract workflow data into meaningful insights. You can track metrics (like number of runs for each workflow, user-specific usage patterns, and workflow run statuses) across different research projects. By encouraging comprehensive tagging of workflow runs, you can create a rich, query-able dataset that supports strategic decision-making.

5. Streamlined benchmarking: You can conduct sophisticated comparative analyses with ease. By creating workflow variations—experimenting with different tool versions, computational resources, and parallelization strategies—you can build custom visualizations that compare costs and runtime performance. This approach supports continuous improvement, so you can iteratively refine your computational strategies.

6. Scalability: As research projects grow in complexity and scale, understanding computational limits becomes crucial. The monitoring solution provides visibility into AWS HealthOmics quota limits and potential bottlenecks. You can then proactively engage with AWS support to request appropriate quota increases. This forward-looking approach confirms that computational infrastructure evolves alongside research ambitions.

Conclusion

We overviewed how you can set up a monitoring and observability solution to gain insights into your AWS HealthOmics workflow runs. This solution provides you automated failure notification emails about failed AWS HealthOmics runs. It also facilitates better visibility and insights into operational metrics, with enhanced governance over your organization’s use of HealthOmics.

Contact an AWS Representative to know how we can help accelerate your business.

Further reading

Kayla Taylor

Kayla Taylor

Kayla Taylor is a Solutions Architect specializing in Healthcare and Life Sciences based out of Northern Virginia. Her expertise and passion for learning has enabled her to support enterprise Healthcare and Life Science customers across North America. Kayla has a background in bioinformatics where she has done extensive STEM cell, metabolic, and genomics research.

Denny Daugherty

Denny Daugherty

Denny Daugherty is a Principal Technical Account Manager specializing in Healthcare and Life Sciences. He brings a background in infrastructure management, systems architecture, application development, and technology leadership to support global enterprises while serving as an industry leader in AWS Enterprise Support. He is motivated by how AWS empowers customers to create technology innovations that advance health and well-being.

Nadeem Bulsara

Nadeem Bulsara

Nadeem Bulsara is a Principal Solutions Architect at AWS specializing in Genomics and Life Sciences. He brings his 13+ years of Bioinformatics, Software Engineering, and Cloud Development skills as well as experience in research and clinical genomics and multi-omics to help Healthcare and Life Sciences organizations globally. He is motivated by the industry’s mission to enable people to have a long and healthy life.

Sujaya Srinivasan

Sujaya Srinivasan

Sujaya Srinivasan is a Solutions Architect specializing in Genomics and Life sciences. She has a strong background in both technology and bioinformatics, and has more than a decade of experience working in oncology, clinical genomics and pharma. She is passionate about using technology to accelerate research and discovery in life sciences, genomics and precision medicine.