Greenwood Genetic Center transforms genomic medicine on AWS

This is a guest post from the Greenwood Genetic Center (GGC), an Amazon Web Services (AWS) customer.

The Greenwood Genetic Center (GGC) is a nonprofit institute organized to provide clinical genetic services, diagnostic laboratory testing, educational programs and resources, and research in the field of medical genetics. Our vision is to be a Center of Excellence in Medical Genetics, serving as a resource for everyone who needs genetic services or information and working to reduce the prevalence and impact of genetic disorders.

To advance this vision, we introduced the GGC Precision Medicine Initiative to unlock the potential of personalized medicine, revolutionize disease understanding, and empower individuals with tailored interventions for improved health outcomes across South Carolina by harnessing the power of advanced genomic technologies, interdisciplinary collaboration, and data-driven insights. The broad Precision Medicine Initiative is built on four main pillars: Access, Analysis, Answers, and Action. In this post, we focus on the Analysis component, specifically data analysis.

The medical records challenge

Over our 50-year history, the GGC has collected a vast library of health records. However, we lacked a systematic way of making these files accessible to clinicians and researchers. The GGC has data in multiple source systems that use different data models. These source systems include clinical data from our electronic health record (EHR) systems; Epic electronic medical record (EMR) CSV extracts stored in an Access database; VisualFoxPro database files (DBFs); diagnostic data from our laboratory management system (LIMS); genetic variant data in the form of Variant Call Format (VCF) files from our medical genetics record system (Medgis); and even paper charts.

Our legacy systems required physicians to manually sift through these various data sources. This was time-consuming and susceptible to human error, which increased the likelihood of incorrect results. Many patients travel to GGC facilities from other parts of the state and results of searches often weren’t available until patients had left the clinic and returned home. This delayed further testing and medical management. Based on these challenges, knew we could improve the clinical experience for both our patients and providers.

Precision Medicine Initiative goals, objectives, and architecture

As a part of the GGC Precision Medicine Initiative’s Analysis pillar, our primary goal is to help providers deliver diagnoses and treatment plans with greater accuracy while a patient is still in the clinic. To make our medical records accessible to providers, we created a data warehouse to serve as a single repository for the GGC’s data, where our clinicians have access to an all-encompassing view of patients’ records. (You might argue that GGC’s solution is technically a data lake or lake house, but the GGC team refers to it as a data warehouse—so we use that term in this post.) The data warehouse allows our providers to query data more quickly, explore trends between patients, and ultimately to make faster and more accurate diagnoses.

The GGC had three objectives when building our data warehouse:

Transform our data to a common data model (CDM)
Create an all-encompassing view of the patient’s records using entity resolution
Empower clinicians to query and view this data quickly and easily

Data model and architecture

We built a system to transform our legacy data into a CDM, choosing the Fast Health Interoperability Resource (FHIR) specification. The FHIR specification offers a standard that provides consistency between resources regardless of the data source.

The GGC’s data warehouse architecture is similar to the AWS Guidance for Multi-Modal Data Analysis with AWS Health and ML Services, but it has some key differences. The following diagram shows the GGC’s high-level architecture.

Figure 1. GGC Precision Medicine Initiative architecture

Objective 1: Transform our data to a common data model

We extract .xml data files from legacy systems and use AWS DataSync to move them to Amazon Simple Storage Service (Amazon S3) weekly. Every time a new file arrives in S3, Amazon S3’s event notifications feature invokes Python functions in AWS Lambda that transform the .xml file to its respective FHIR resource. These Lambda functions output NDJSON files with the transformed FHIR resources. Another Lambda function loads these FHIR resources into AWS HealthLake. HealthLake provides a pre-built AWS CloudFormation template to extract the AWS HealthLake data and put them into our AWS Lake Formation data lake.

For genomics data, we use AWS HealthOmics to combine .vcf files with annotations. AWS HealthOmics provides annotation stores so that common models for annotation like the Genome Aggregation Database (gnomAD) and the National Institute of Health (NIH) ClinVar archive for variant classification can be applied to the variant data. We reduce cost by using HealthOmics variant stores as a parser: HealthOmics reads the .vcf files, then Amazon Athena queries those records and stores them in Amazon S3. Lastly, we connect the variants and their annotations in an AWS Lake Formation data lake, where we can combine them with other medical records by running an Athena query.

Objective 2: Create an all-encompassing view of a patient using entity resolution

To supply providers with an all-encompassing view of a patient, we must reconcile patient records from different sources in our data warehouse. AWS Entity Resolution provides a pre-built machine learning (ML) model for matching and deduplicating healthcare records. After comparing it with other entity resolution systems on the market, we found that this model provided stronger price-performance with respect to accuracy.

Objective 3: Empower providers to query and view data quickly and easily

GGC provides a custom web app for clinicians with several useful features, including:

A dynamic query builder that gives users the flexibility to create custom queries and returns all resources that meet the criteria. For example, a provider can create a query to show all female GGC patients who were born in a certain year, have a specific diagnosis, and were seen in a virtual visit. We use Amazon Athena to run custom queries.
A variant search, with which users can browse relevant patients based on the gene or regions of a gene.
A patient viewer, which shows an all-encompassing view of a patient including clinical encounters, diagnostic tests ordered, and lab reports.

Results and future work

Providers can now access the GGC’s entire library of patient health records, which will help with population studies, cohort recruitment, and clinical diagnostic support. In 2024, GGC transitioned our legacy EHRs and laboratory management system data into our data warehouse. In December, we launched the newest version of our web app, where users have access to all data from these two systems, variant data we’ve collected on patients, and entity resolution data which matches patients across our entire dataset. With better querying capabilities, clinicians can analyze the data, get answers, and take action while patients are still in the clinic.

We actively collect user feedback to solicit ideas for potential enhancements or improvements to the interface. We also monitor our audit trail to gauge user engagement. In the long term, we plan to digitize and integrate our paper records and clinical images into the data lake house. We expect that as we migrate more data into our data warehouse and enhance our analytic capabilities, we can scale up user engagement.

We’re encouraged by the Precision Medicine Initiative’s vision of enabling Access, Analysis, Answers, and Action. Our team is excited to see how our Precision Medicine Initiative will lead to faster and more accurate diagnoses and more effective treatments for our patients.

We would like to express our thanks to software developers Michael Mendes and Ben Weaver for their hard work designing and developing the GGC’s data warehouse, and to AWS solutions architects Scott Glasser and Sam Grace for their assistance with AWS services and for reviewing and providing valuable feedback on this blog post.

AWS Public Sector Blog

Greenwood Genetic Center transforms genomic medicine on AWS

The medical records challenge

Precision Medicine Initiative goals, objectives, and architecture

Data model and architecture

Objective 1: Transform our data to a common data model

Objective 2: Create an all-encompassing view of a patient using entity resolution

Objective 3: Empower providers to query and view data quickly and easily

Results and future work

Resources

Follow

Learn

Resources

Developers

Help