AWS for Industries
Genentech reimagines insights from unstructured data using generative Artificial Intelligence
The Commercial, Medical, and Government Affairs (CMG) team of Genentech, a member of the Roche Group, partners across the healthcare system to deliver better health outcomes and equitable access for patients. This team was looking to leverage rich unstructured data at scale, across various formats/modalities such as text, PDFs, PowerPoint slides, images, videos, and audio files to drive improved customer experience and business outcomes. Unstructured data due to its diverse formats, inconsistent organization, and lack of standardized structure makes it difficult to effectively search, analyze, and extract meaningful insights.
In this post, we show how the CMG team developed a robust product for processing unstructured data at scale to unlock the value contained in those data sources using Generative Artificial Intelligence (Gen AI) technology resulting in enhanced patient interactions, actionable insights, and increased productivity.
Problem Statement and Solution Approach
The Genentech CMG team works with large amounts of unstructured data, and processing it efficiently is a challenge for any organization. In order to make this process more efficient and scalable, they decided to leverage AI and automation. For instance, the information that customer field executives entered into customer relationship management tools was only partially leveraged for generating insights. Teams engaged in market research or competitive intelligence analysis encountered processes requiring substantial manual effort to curate the content. Subsequently, synthesizing and deriving insights from this data took weeks or even months. External vendor-based custom solutions were implemented, but with suboptimal outcomes.
To address the challenges of harnessing unstructured data at scale, the Genentech team developed a product called “Deepsense” that empowers teams with AI-driven insights derived from multiple data sources, presented in a user-friendly and consumable format. While ensuring compliance with security and regulatory requirements, Deepsense improves access to unstructured data and data-driven insights, enabling faster and more informed decision-making for customers and patients. Moreover, Deepsense also establishes a robust foundation that allows for the seamless integration of new data sources, fostering agility and adaptability in an ever-evolving digital landscape. Figure 1 depicts the conceptual view of Deepsense.
Figure 1. Conceptual view of the Deepsense Application
Solution Overview
Figure 2. Architecture Diagram for the Deepsense Application
Deepsense has a frontend that is powered by a web application hosted using Amazon Simple Storage Service (S3), an object storage service and Amazon CloudFront, a content delivery network service. From a data perspective, all of the unstructured data are stored in a dedicated data lake that utilizes Amazon S3. A data pipeline extracts and processes the data from the data lake followed by a machine learning pipeline. The machine learning pipeline utilizes Amazon SageMaker, a machine learning service, to perform topic modeling, text vectorization and sentiment analysis. The processed data is then loaded into Amazon OpenSearch Service, which aids in semantic and full-text search.
From a user perspective, when they login to the web application and try to search a topic, the application retrieves the insights from Amazon OpenSearch Service via a REST API and displays it to the user. Deepsense offers advanced filtering and search capabilities that enable users to filter data and generate summaries across records in real-time within seconds, or they can ask questions directly to the relevant data using chat capabilities powered by Retrieval Augmented Generation (RAG).
To generate better outputs for topic modeling, sentiment analysis and summarization tasks, an iterative development approach was employed to test and evaluate various machine learning models to ensure the highest quality outputs by leveraging Amazon SageMaker Studio.
With the advent of Generative AI, key insights that previously required custom, fit-for-purpose ML models are now being generated by Generative AI Large Language Models (LLMs) such as Anthropic Claude 3, and Anthropic Claude 3.5 Sonnet via Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies.
LLMs are used within Deepsense to derive customer sentiment, identify emerging themes and topics, generate concise summaries of data, and enable conversational workflows (Q&A and Chatting capabilities). Deepsense leverages Genentech’s proprietary domain specific data with LLMs to generate diverse content such as emails and visuals for campaigns and internal purposes. This encompasses crafting compelling narratives, visually informative graphics, and transcribing audio to contextualized text chunks for efficient retrieval. Such versatile applications streamline content creation and knowledge management, unlocking data-driven insights and continuous innovation across the company.
Based on guidelines from the security pillar of the AWS Well-Architected Framework, specific measures are incorporated into the design and implementation. Data is encrypted in transit and at rest as it moves through various layers of processing and storage as described above. By implementing the principle of least privilege access, Deepsense application is accessible to authenticated users only, and unauthorized access to the system is prohibited by design. For governance and audit requirements, comprehensive monitoring and logging capabilities of Amazon S3, Amazon Bedrock and Amazon OpenSearch Service are utilized.
Future of Deepsense
The Genentech Deepsense team has plans to seamlessly integrate unstructured and structured data sources, and enable interactions with end-users via a natural language interface. This powerful fusion will empower users to pose queries that not only consider unstructured data such as text, but also integrate with structured data. By bridging these two realms, Deepsense will deliver more comprehensive and robust insights, providing users with a holistic and contextualized understanding of the information at hand.
Conclusion
Genentech’s creation of Deepsense represents a significant advancement in unstructured data processing for the organization. Utilizing LLMs, Genentech tackles traditional AI challenges, facilitating enhanced patient interactions, actionable insights, and increased productivity.
Deepsense allows for democratized access to unstructured data, delivering AI-driven insights while also ensuring compliance and security. This initiative empowers decision-making across the company’s business focus areas.
About Genentech
Genentech is widely recognized as a pioneer in the biotechnology industry that transformed the way we approach and treat some of the most intricate and challenging health issues. As a member of the Roche Group, the company remains dedicated to pursuing breakthrough research, developing life-changing medicines, unlocking advances in data and technology, and partnering across society to take on systemic issues that stand in the way of better healthcare for all.
Additional Resources
To learn more about Generative Artificial Intelligence using Amazon Bedrock check out:
Contact an AWS Representative to know how we can help accelerate your business.