AWS Public Sector Blog
Getting drugs to market faster through better health data management on AWS

The data challenge and opportunities
Whether it’s a new innovative medicine which follows a journey from lab to patient or a generic or medical product which goes from a pharmaceutical manufacturing facility to a patient, ensuring faster discovery, delivery, and availability of medicines is both critical and challenging. The COVID-19 pandemic demonstrated how supply strain of essential medical products or the discovery and delivery of vaccines impacted our lives.
Post-pandemic medicine shortages have worsened across the European Union (EU). For example, many EU countries have reported a 50 percent year-on-year (YoY) increase in prescription medicine shortages across many therapeutic groups. In the Netherlands, 2,292 medicines were marked unavailable for at least two weeks in 2023, representing around 15 percent of all types, including drugs needed for cancer, epilepsy and Parkinson’s disease. These affected medicines were out of stock for 107 days on average in 2023, compared to 91 days in 2022. In Switzerland, 786 missing products, including 360 active ingredients such as antibiotics, tranquilizers, vaccines and injection solutions for diabetics were reported in 2023. Data plays a pivotal role in supply, demand, and real-time monitoring of the drug supply chain by medicine authorities.
In 2023, 55 new drugs were approved by U.S. Food and Drug Administration (FDA). Research and development (R&D) is the first and most critical stage in a medicine’s journey from lab to patient. On average, R&D cost per new drug is approximately $1.5 billion. Thousands of substances are investigated every year by pharmaceutical and biotechnology companies, health providers, and academic medical centers (AMCs) for their potential to treat diseases. Only a small number of substances are promising enough to be tested in patients and just a fraction of these will ever have study results good enough for patients. Data is a critical part of the new drug product life cycle management and journey.
The traditional, centralized, and siloed approach of data management in the healthcare and life sciences (HCLS) industry has proved inadequate, unable to keep up with the dynamic nature of data and the evolving needs and challenges of new drug development and supply chain issues.
Data mesh to the rescue
Data mesh is an architectural framework that enables organizations to redefine how they manage and utilize their data assets. It embraces a decentralized, domain-driven approach, where individual business units or “domains” take responsibility for the data they generate and consume. This shift in ownership and governance is underpinned by the principle of data as a product (DaaP), where data is treated as a valuable, high-quality asset, just like any physical product an organization offers.
By adopting a DaaP mindset, pharmaceutical companies can unlock a host of benefits. Domain experts, empowered to own and manage their data, can confirm its accuracy, completeness, and relevance to their specific stakeholders, leading to more informed decision-making and better outcomes throughout the drug development lifecycle. The self-serve, agile nature of the data mesh also enables faster adaptation to evolving business requirements and regulatory changes, accelerating the path to market for new treatments.
Data mesh principles
 
 
        Figure 1. The four principles of data mesh architecture.
Importantly, the data mesh model supports robust data governance and compliance, as data domains can closely align their data management practices with GxP guidelines, ISO 27001 standards, and other industry-specific regulations. This, in turn, builds trust among internal and external stakeholders, strengthening the pharmaceutical industry’s reputation and long-term viability.
In this post, we explore how HCLS organizations can embrace the data mesh and DaaP principles to unlock the full potential of their health data, drive faster and more efficient drug development, and ultimately, bring life-saving treatments to patients more quickly. We also showcase the Amazon Web Services (AWS) services that support the journey towards effective data management and alignment with data mesh principles.
The DaaP approach
The DaaP approach is a strategic mindset and operational model that treats data as a valuable asset and product within an organization. It involves adopting processes, practices, and technologies to manage, package, and deliver data in a way that maximizes its value and usability for the consumers while enforcing security and governance.
 This approach has important implications for organizations dealing with protected health information (PHI), complying with ISO 27001, GxP guidelines, and the Health Insurance Portability and Accountability Act (HIPAA).
HIPAA sets strict regulations around the use, disclosure, and safeguarding of PHI, which includes any individually identifiable health information. When treating data as a product, organizations must verify that their data management practices, processes, and products adhere to HIPAA requirements. ISO 27001 emphasizes the importance of identifying and assessing information security risks, making sure organizations implement risk management process and develop mitigation strategies.
The key aspects of the DaaP approach include:
Product mindset: Viewing data as a product means treating it with the same rigor and discipline as any other product offering. This includes defining product requirements, managing product lifecycles, and ensuring product quality and consistency. This adheres with ISO 27001’s requirements for asset management, including identifying, classifying, and protecting information assets.
Data governance: Robust data governance policies and procedures are crucial for HIPAA compliance. This includes implementing administrative, technical, and physical safeguards to protect the confidentiality, integrity, and availability of PHI. Establishing robust data governance frameworks, policies, and processes verify data quality, security, privacy, and compliance. This includes data stewardship roles, metadata management, and data lineage tracking. This also aligns with ISO 27001’s requirements for defining and implementing an information security management system (ISMS), including establishing information security policies, procedures, and organizational structure.
Data productization: Transforming raw data into consumable and valuable data products by curating, enriching, and packaging data in a way that meets specific consumer needs. This may involve data cleaning, formatting, and creating data feeds, APIs, or dashboards. These processes must adhere to ISO 27001’s requirements to confirm tight governance, as stated in the previous section.
Data marketplace: Creating an internal or external data marketplace where data products can be discovered, accessed, and consumed by authorized users or customers. This may involve catalogs, search capabilities, and self-service portals. This can be proven as a major challenge as many security policies, procedures and risk management have to be performed to meet the ISO 27001 standard.
Continuous improvement: Adopting an iterative approach to data product development, incorporating feedback from consumers, monitoring usage metrics, and adapting to changing business needs and technological advancements. This also requires to implement a process for continuously improving the ISMS based on feedback from audits, security incidents, and changes in the threat landscape or business requirements.
Monetization: In case of external consumers the data products should have clear monetization strategies, whether through direct sales, subscription models, or value-added services. Understanding the target market and pricing models is crucial for successful commercialization.
The DaaP approach promotes a cultural shift within organizations, encouraging cross-functional collaboration, data literacy, and a data-driven mindset. It enables organizations to unlock the value of their data assets and drive innovation while ensuring data quality, security, and compliance.
When treating data as a product, it is essential to consider its characteristics to verify its value, usability, and effectiveness.
DaaP characteristics
 
 
        Figure 2. The DaaP characteristics
Quality: Data products must adhere to stringent quality standards, including accuracy, completeness, consistency, and timeliness. High-quality data is crucial for making informed decisions and providing valuable insights. If data products do not require individual identities, organizations should consider de-identifying PHI through approved methods, such as expert determination or safe harbor methods, to create HIPAA-compliant data products. This aligns with ISO 27001’s requirements for maintaining the confidentiality, integrity, and availability of information.
Discoverability: Data products should be easily discoverable and accessible to potential consumers, whether internal or external to the organization. Proper metadata management, cataloging, and search capabilities facilitate discoverability. Depending on the intended use of data products containing PHI, organizations may need to obtain appropriate consent or authorization from individuals before using or disclosing their PHI.
Usability: Data products must be presented in a user-friendly format, with clear documentation, appropriate data structures, and intuitive interfaces. This enhances the ease of use and enables consumers to derive value from the data efficiently.
Interoperability: Data products should be designed to integrate seamlessly with other systems, platforms, and applications. Adherence to industry standards and the use of open data formats promote interoperability and facilitate data exchange.
Security and privacy: Robust security measures and data governance policies must be in place to protect sensitive information and maintain compliance with relevant regulations, such as data privacy laws and industry-specific requirements. Data products containing PHI must adhere to HIPAA’s security and privacy rules as well as the security requirements on ISO 27001, including implementing appropriate technical, physical, and administrative safeguards to protect the confidentiality, integrity, and availability of the data.
Scalability: As data volumes and demand for data products grow, the infrastructure and processes supporting data products should be scalable to accommodate increasing workloads and user bases.
Using AWS services can facilitate implementing DaaP and data mesh principles, which in turn can help organizations more easily comply with the HIPAA, ISO 27001,and GxP guidelines. AWS provides customers with the ability to architect and configure their systems on AWS in a way that conforms to the ISO 27001 standard, thanks to the security features, services, and best practice guidance.
Enabling DaaP
Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern their data stored across AWS, on premises, and third-party sources. With Amazon DataZone, administrators and data stewards who oversee an organization’s data assets can manage and govern access to data using fine-grained controls. These controls are designed to provide access with the right level of privileges and context. Amazon DataZone makes it easier for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so that they can discover, use, and collaborate to derive data-driven insights.
DaaP architectural considerations
 
 
        Figure 3. DaaP architectural considerations.
Architectural Quantum. Data product is the node on the mesh that encapsulates three structural components required for its function, providing access to the domain’s analytical data as a product.
- Code includes : 
         - (a) code for data pipelines responsible for consuming, transforming and serving upstream data received from domain’s operational system or an upstream data product
- (b) code for APIs that provide access to data and schema, observability metrics and other metadata
- (c) code for enabling access control policies, compliance, provenance, etc.
- In the current architecture, this is managed by the code repository in each of the data producer platform.
 
- Data and metadata – Depending on the nature of the domain data and its consumption models, data can be served as files, tables or graphs, ensuring the semantic. For data to be usable, associated metadata and quality metrics are served by data producer platform. For access control policies, AWS Lake Formation can be leveraged.
- Infrastructure: The infrastructure component includes storage and access methods to build, deploy, and run the data product’s code. In this architecture guidance, this can be managed by Lake Formation.
Temporal Immutable data – The data store in the overall architecture needs to record the value history of data at different times to confirm time travel capability of the data platform. Using open table format on Amazon Simple Storage Service (Amazon S3), as a storage layer, such functionalities can be achieved. Since metadata files track all the changes in data, return to a point-in-time table state is possible.
Multimodal access – Architecture should verify that the data products are consumable through variety of options like Pub/Sub events, SQL access mode, and File access mode. This is to make sure consumer applications can access the data in whatever form they need and reduce this overhead and complexity from the downstream systems.
Type of data products – Data as a Product can be designed with different mindsets depending on the usage type:
-  
         - Source-aligned – These are the data assets generally with minimal transformation applied to operational data sources like datasets, tables, and views
- Consumer-aligned – These are generally refined data assets created by business experts, particularly curated with domain knowledge
 
For mapping these architectural considerations to Data mesh reference architecture and prescriptive guidance, please refer Data mesh on AWS for HCLS customers.
Conclusion
Lack or delayed access to medical products has major implications on health and societies, such as delayed treatment and diagnoses, strain on healthcare systems, and increased healthcare costs. Whether it’s a promising new innovation or an already available generic medicine, life sciences, healthcare and drug regulators invest significant resources to launch and verify availability of medicines for patients. Data plays a critical role in a every stage of medicine’s journey from lab or manufacturing facility to patient.
In the high-stakes world of healthcare and pharmaceutical development, data has become the cornerstone of driving faster drug discovery and delivery. However, traditional data management approaches often fall short, hindering the industry’s ability to keep pace with the exponential growth and evolving needs of health data. The principles of the data mesh, particularly the principle of DaaP offer a transformative solution, unlocking the power of data to accelerate the path from discovery to market.
In this post, we underpinned how HCLS customer can adopt data mesh and DaaP framework and principles to establish robust and scalable data management with governance at the core. The data mesh principles empower domain teams to manage and share their data, fostering data ownership and transparency. Self-serve platforms and federated governance enable agile, collaborative data sharing that drives better decision-making and unlocks data potential. Visit AWS for Healthcare and Life Science to learn how AWS is helping customers and supporting HCLS industry.