Transforming Medicaid: Crafting a robust data strategy with AWS for operational efficiency

What if a Medicaid agency could make sure that their leadership and staff had access to a reliable, single source of truth for their data in one unified platform? The implementation of an operational data store (ODS) presents a viable solution. An ODS serves as a centralized repository that consolidates data from various Medicaid modules and applications into a single, integrated system. This approach not only enhances data integrity and accessibility but also streamlines operations, reduces redundancy, and improves the overall efficiency of Medicaid systems. As states continue to transition towards more modular architectures and look for efficiencies, adopting an ODS is vital in maintaining cohesive and effective data management.

State Medicaid agencies have been moving away from monolithic Medicaid systems and toward a more modular architecture. In this new reality, critical information is dispersed across numerous relational and noSQL databases, each serving distinct modules within the Medicaid framework. Without a unified approach, there is no single source of truth, making it increasingly difficult for stakeholders to obtain a comprehensive view of Medicaid operations and data. This fragmentation can lead to inefficiencies, duplicated efforts, and potential errors in data analysis and decision-making. Further, as Medicaid agencies face budget and workforce shortages, they are trying to do more with less and maximize efficiency.

According to Gartner research from 2020, inconsistency in data across sources is the most challenging data quality problem. Storing and maintaining data in silos with significant overlaps, gaps or inconsistencies can contribute to these challenges, and if data sources are not connected, data standardization can become a blocker.

Approach

The traditional Medicaid Management Information System (MMIS) is a comprehensive system responsible for claims adjudication, financial transaction processing, decision support, pharmacy benefits management, contact centers, and the storage of provider and recipient data, alongside the implementation of business rules and reference data. As these functions are deployed into individual modules, data is distributed across various database types such as Amazon Aurora, Microsoft SQL, DB2 on mainframe, PostgreSQL, and Amazon Simple Storage Service (Amazon S3). Medicaid agencies must plan to ingest data from these diverse sources to create a unified ODS. The ODS should catalog data, perform quality checks, manage master data, and use a business glossary to produce actionable insights. This post explains the architectural choices available with Amazon Web Services (AWS) to effectively create and maintain such an ODS.

Consider a legacy MMIS like the one shown in the following figure. Issues such as processing claims and encounters, financial management, enrolling providers, and pharmacy benefits management are all handled by a monolithic application. Many states are still using such a system where all the operational data is within this monolithic application. Weekly or monthly data exchange happens between this monolithic application and an enterprise data warehouse (EDW). A traditional EDW is inflexible and takes months to perform any changes or generate new reports. As the warehouse grows, performance starts to suffer and more hand-tuning is required to maintain performance. More hardware is required, with lengthy and expensive procurement cycles. Jobs that might run overnight now leak into business hours, and there is often no time to add new jobs. And the old ones almost never go away. With all these limitations, the EDW still plays a critical role in delivering canned reports for the state agency. Many times, the state agency relies on the operational data within the legacy MMIS databases to gather and perform analytics and use the EDW for canned reports.

Figure 1. Legacy MMIS

With modularization of Medicaid modules, as shown in the following figure, the monolithic application is broken into multiple smaller applications. Each application (for claims and encounters, provider, pharmacy, and so on) might be managed by different vendors using different databases to store the corresponding data. The data still gets shipped to the EDW through the systems integrator (SI) module on a weekly or monthly basis for the established canned reports. But the states now must join the data across multiple modules to gather the operational insights they used to get from their monolithic application. With this, state agencies are forced to depend on the EDW for operational analytics. But with the rigid schema, this model could cause additional delays in getting actionable insights.

Figure 2. Modularized Medicaid enterprise system

Because the data is passing through the systems integrator module, states can capture the data in an ODS in near real time before shipping it to an EDW. Under this model, states retain the ability to get operational insights from a flexible schema-based data store that gets near real-time data from each of the modules. This data store can be the single source of truth enabling the agency to have appropriate governance and make the data findable, accessible, interoperable, and reusable.

The EDW can still serve the purpose for the canned reports, but now states will have the opportunity to further investigate the purpose of the EDW and ship only the data that needs complex query executions to the EDW. The following diagram illustrates the modularized architecture with ODS.

Figure 3. Modularized Medicaid enterprise system with ODS

The following architecture section delves into an approach for establishing this operational data store. Note that this operational data store doesn’t need to be in the systems integrator module. Regardless of where the ODS resides, the concept of getting near real-time data from each of the modules into the ODS before shipping the data to EDW remains the same.

Architecture

Now we will dive deep into the architecture of the ODS. Health and human services (HHS) systems on AWS are designed to securely ingest, store, and process sensitive data, including personal identifiable information (PII), protected health information (PHI), and federal tax information (FTI). These systems demand stringent security and privacy controls, alongside adherence to specific regulatory compliance requirements such as HIPAA for PHI, Minimum Acceptable Risk Standards for Exchanges (MARS-E) for Affordable Care Act (ACA) administering entities, and IRS Publication 1075 for FTI processing. AWS Trusted Advisor provides checks to help customers maintain their security posture for regulated HHS workloads. HIPAA eligible services and FedRAMP compliant services are available in the AWS U.S. regions. HHS workloads can be effectively isolated in dedicated accounts with data stored in U.S. Regions, using least privileged IAM policies, MFA, and FIPS encryption for data at rest and in transit. You can read about running regulated workloads on AWS on this webpage.

The architecture to establish ODS illustrates a modern data framework with decoupled components designed for seamless scaling and flexibility, allowing your agency to adapt quickly as analytical requirements and data volumes evolve over time. It contains the following elements:

Data sources: This is a database of choice for the vendor implementing a module. The approach used to build ODS should be able to accommodate a wide variety of databases.
Data ingestion: AWS Glue efficiently handles mainframe bulk data ingestion, and AWS Marketplace offers solutions for capturing delta changes. For relational database migration, AWS Database Migration Service (AWS DMS) provides seamless transfer capabilities. AWS DataSync facilitates smooth data movement from existing data lakes, and Amazon Simple Queue Service (Amazon SQS) enables real-time ingestion through a flexible publish-subscribe framework for streaming sources.
Bronze data storage: Store your data in a straightforward, low-cost, highly resilient storage that you can always come back to for auditing and provenance determinations. Amazon S3 Glacier is used to archive historical raw data for long-term retention while maintaining accessibility for compliance requirements and enabling retrieval when needed for data lineage verification or reprocessing through your analytics pipeline.
Catalog and data quality: A data catalog is essential in a modern data architecture because it serves as a centralized inventory system that documents and organizes metadata about all data assets, making data discovery and understanding efficient across the organization. AWS Glue Data Quality uses machine learning (ML) algorithms to automate data quality management in data lakes, reducing manual efforts from days to hours with automatic statistics, rule recommendations, monitoring, and alerts. AWS Glue DataBrew is used to cleanse data.
Silver data storage: Clean data has been validated and standardized but maintains its original granularity separately from raw data. This approach is essential for artificial intelligence and machine learning (AI/ML) applications that typically require access to cleaned but minimally transformed data while also facilitating transparent data lineage tracking throughout the data lifecycle.
Data transformation and entity resolution: AWS Glue provides serverless extract, transform, and load (ETL) capabilities to transform and prepare data at scale, and AWS Entity Resolution identifies and resolves duplicate or conflicting records across different systems. The native transformation features of AWS Glue enable data normalization, aggregation, and enrichment, promoting data accuracy and preventing redundancy while maintaining consistent identifiers across the Medicaid ecosystem.
Operational data store (Gold Zone): Serves as the organization’s single source of truth by housing transformed, validated, and business-ready datasets in optimized formats, providing data quality, compliance, and governance while enabling efficient self-service analytics and ML applications through standardized schemas.
Consumption layer – Analytics and dashboard: Amazon Redshift delivers high-performance data warehousing for complex analytical queries across massive datasets, providing the computational power needed for business intelligence workloads. Amazon Athena offers serverless SQL querying directly against your Amazon S3 data lake, enabling immediate insights without data movement or infrastructure management. Amazon QuickSight transforms these analytical results into intuitive, interactive dashboards and visualizations that make data accessible to all stakeholders through its cloud-native business intelligence platform. The layer could also house the state’s EDW eventually.
Consumption layer – Data sharing: Amazon DataZone creates a unified data management environment specifically designed for simplified data sharing and discovery. This platform provides a business-friendly data catalog where data producers can publish curated datasets with clear documentation, quality metrics, and usage policies. Consumers can easily discover, request access to, and integrate information through Amazon DataZone self-service capabilities, bridging organizational boundaries. The solution maintains end-to-end data lineage, enforces security policies, and provides regulatory compliance while providing a streamlined experience for data sharing across teams, departments, and external partners, creating a secure yet accessible data ecosystem for all stakeholders.
Consumption layer – AI/ML: This layer represents the final stage in modern data architecture, where processed data transforms into actionable insights through enterprise data warehousing solutions and advanced AI capabilities. Amazon Bedrock provides foundation models (FMs) to power generative AI applications that can analyze patterns, predict outcomes, and automate decision processes with minimal coding. Combined with traditional analytics tools, this creates a comprehensive intelligence environment—enabling high-performance analytics, interactive dashboards, self-service reporting, and sophisticated AI/ML applications while maintaining security through role-based access controls and delivering customizable visualizations that support both business users and automated systems.
Consumption layer – Data collaboration: Modern data architecture enables secure cross-organizational data collaboration through governed data products and standardized data contracts. AWS Lambda functions paired with Amazon API Gateway create secure, scalable interfaces for programmatic data access, allowing partners and internal teams to consume specific datasets through well-defined APIs. This integration layer supports real-time data exchange without direct database access, maintaining security while providing the flexibility to support diverse consumption patterns. Fine-grained access controls, comprehensive audit logs, and standardized interfaces mean that data collaboration maintains governance requirements while supporting operational needs.
Governance, security, and orchestration: AWS Lake Formation helps implement governance, security controls, and fine-grained access management. Enterprise data security is paramount in protecting sensitive information across the data lifecycle through robust encryption, access controls, and audit mechanisms, providing regulatory compliance (FedRAMP, HIPAA) through centralized key management, automatic data encryption at rest and in transit, comprehensive audit trails, and fine-grained access controls across all data environments.

The following diagram illustrates this architecture.

Figure 4. An approach to establish ODS

You can automate the coordination of complex data workflows by managing dependencies, scheduling jobs, and promoting reliable execution of data transformations, quality checks, and movement across different layers of the data platform. These automations drive scalability and operational efficiency by reducing manual steps, driving consistency, and providing audit trails and lineage tracking.

Conclusion

The transition from monolithic Medicaid systems to modular architectures presents both challenges and opportunities for state Medicaid agencies. The dispersion of critical information across various relational databases can lead to inefficiencies, duplicated efforts, and potential errors in data analysis and decision-making. However, the implementation of an ODS offers a robust solution to these issues. By consolidating data from disparate Medicaid modules and applications into a single, integrated system, an ODS enables Medicaid agency to have access to a reliable, single source of truth in one unified platform. Such an operational data store gives the ability for the states to get operational insights across their modules in near real time, while still making use of the EDW system for the complex-query-based canned reports.

As Medicaid agencies grapple with budget constraints and workforce shortages, adopting an ODS can help them maximize efficiency and maintain cohesive data management. The architectural choices available with AWS provide a flexible and scalable framework for creating and maintaining an ODS, which means that leadership and staff have access to a reliable, single source of data truth. In an era where doing more with less is imperative, the ODS stands out as a pivotal tool for states transitioning to more modular architectures in the cloud, enabling them to achieve greater efficiency and effectiveness in their Medicaid operations.

Health and human services agencies across the country are using the power of AWS to unlock their data, improve citizen experience, and deliver better outcomes. Learn more at Health and Human Services Cloud Resources. Learn how governments use AWS to innovate for their constituents, design engaging constituent experiences, and more by visiting the AWS Cloud for State and Local Governments hub.

AWS Public Sector Blog

Transforming Medicaid: Crafting a robust data strategy with AWS for operational efficiency

Approach

Architecture

Conclusion

Resources

Follow

Learn

Resources

Developers

Help