AWS for Industries

Bayer imaging FM classifies drug targets using Amazon SageMaker HyperPod

This blog is co-authored by: Marc Osterland, Lisa Schneider, Adrian Wolny, and Vladislav Kim from Bayer AG, Pharmaceuticals

The pharmaceutical industry is starting to adopt artificial intelligence (AI) foundation models (FMs) to enhance research and development workflows. Bayer Pharmaceuticals (a division of Bayer AG), a multinational pharmaceutical and agricultural company headquartered in Germany) wanted to extract insights from their development data sets, so their data scientists turned to Amazon Web Services (AWS) for assistance.

By harnessing the power of Amazon SageMaker HyperPod, Bayer trained and utilized new FMs in just a few short months. Their scientific team can now process vast amounts of biomedical imaging data, train sophisticated machine learning (ML) models, and identify promising drug candidates based on phenotypic signatures. As Bayer continues to innovate, their work with AWS helps to pave the way for faster, more efficient pharmaceutical R&D. We’ll explore how Bayer uses AWS services to transform their research processes and drive innovation in pharmaceutical development.

Cell Painting is a technology for utilizing fluorescent dyes in biological imaging. It has widely been adopted in the pharmaceutical industry for high-content screening (HCS) and understanding how specific genetic, physiological, or drug-binding actions change cell function. The Cell Painting assay, paired with therapeutic molecules, can reveal subtle phenotypic changes in the drug discovery workflow—leading to mechanistic insights or new drug targets.

In digital pathology, FMs are addressing the scalability challenges faced by biopharma organizations that need to analyze millions of histopathological images. Traditionally, morphological profiling uses human-engineered feature extractors such as shape, size, and textures to obtain a vector representation of histopathological images. This is a human-intensive computational process not generalizable across datasets.

Instead of relying on traditional human-engineered feature extractors, AI models can process complex morphological data with remarkable consistency and speed. Initiatives like BigPicture and industry collaborations, such as the Bayer AG partnership with Aignostics, underscore the growing recognition that AI-powered analysis has become indispensable for modern pharmaceutical R&D.

The Challenge: Analyzing millions of large images

Training FMs for cell painting and digital pathology requires processing millions of images. By leveraging Amazon SageMaker HyperPod, Bayer research scientists trained multiple large-scale self-supervised imaging foundation models. They trained DINO, and MAE (from Meta AI), and SimCLR (available from Cornell University), as well as Cell Painting Gallery, (available from the Registry of Open Data on AWS). As shown in Figure 1, these FMs allowed the computer to recognize features across three types of cell treatments.

33 self-attention maps are shown, which appear in pairs. These are for distinct cell treatment conditions – DMSO, FK-866, NVS-PAK-1-1 – each pair shows the original image in greyscale, and a corresponding self-attention map in gradients of black, blue, yellow, pink, and yellow showing areas the DINO algorithm pays attention to. The reader is drawn to compare the pair of images for each, which show rough outlines similar to the original images.Figure 1: DINO self-attention maps. Cell Painting image crops and self-attention maps of the DINO attention heads in the last layer. Example images for DMSO, FK-866 and NVS-PAK1-1. The color scale in the self-attention maps represents the level of attention from the DINO [CLS] token, with lighter areas indicating higher attention. DINO was trained on the multisource data with the ViT-S architecture. This image is courtesy of Scientific Reports and Bayer AG.

Similarly, the Bayer digital pathology and histopathology teams faced significant challenges: analyzing millions of microscopy images to understand morphological and phenotypical changes. Bayer decided to explore if large-scale self-supervised imaging FMs—which do not require image segmentation when it comes to morphological profiling—are a more efficient alternative.

These imaging foundation models can also be applied to other drug discovery workflows at Bayer, such as analyzing enormous (10,000 x 10,000 px) slides, to assist digital pathologists to identify cancerous cells in human tissues. However, these imaging FMs require training on extensive datasets, a task that demanded substantial computational resources and scalability.

Solution overview

Bayer needed a way to provide a flexible, high-performance environment for FM development and training. Enter Amazon SageMaker HyperPod: its seamless integration with current Bayer infrastructure makes it appear as another resource in their network. SageMaker HyperPod allowed for reservation of a cluster of four ml.p4de.24xlarge Amazon Elastic Compute Cloud (Amazon EC2) instances, each with 8 NVIDIA A100 GPUs, with 80 GB of GPU RAM for each device. The science team pre-trained the FMs with 50 TB of data from cell culture images and histopathological slides. They trained the FMs continuously for three weeks to learn features and segmentation patterns.

SageMaker HyperPod provides a persistent, robust cluster for FM training and job queuing. It also provides a developer-friendly environment to debug and inspect the running jobs to verify the full utilization of GPU resources. Bayer uses SageMaker HyperPod to maintain deep infrastructure control. The builders securely connected using Session Manager (a fully managed tool of AWS Systems Manager) to manage the ml.p4de.24xlarge instances for advanced model training, infrastructure management, and debugging.

To maximize availability, SageMaker HyperPod maintains a pool of dedicated and spare instances, which minimizes downtime during critical node replacements. Through this ability, Bayer was able to automatically swap out any failing nodes and restart the model training from the last saved checkpoint. This freed up time for the Bayer Research team.

Bayer team members also needed observability tools to help monitor and manage the workload. The SageMaker HyperPod health monitoring agent continuously monitors and detects potential issues. This includes memory exhaustion, disk failures, GPU anomalies, kernel deadlocks, container runtime issues, and out-of-memory (OOM) crashes. Based on the underlying issue the monitoring agent either replaces or reboots the node. Integration of SageMaker HyperPod with other observability services (such as Amazon Managed Service for Prometheus, and Amazon Managed Grafana) offer the Bayer team deeper insights into cluster performance, health, and utilization They also help streamline development time.

Figure 2 is a high-level architecture diagram of the workflow Bayer researchers use with SageMaker HyperPod. It shows how the various cluster components interact with each other and other AWS services (such as Amazon FSx for Lustre and Amazon Simple Storage Service (Amazon S3).

A solutions architecture diagram, which shows how researchers and administrators can set up and interact with a SageMaker HyperPod-Slurm build. In a white box with the AWS Cloud icon, representing the AWS Cloud environment, an architecture shows control pathways from researchers and engineers, who interact with the head node and send compute jobs, cluster and state management to the self-healing SageMaker cluster, shown at top right in orange (twelve icons representing GPUs). The Admin and Ops roles can interact directly with the cluster or through the head node. At the bottom right is a pink box showing a customer AWS account containing permissions, datasets, and checkpoints (represented by folder and bucket icons). Figure 2: Reference architecture

Benefits of FM workflows

The SageMaker HyperPod-powered workflow has already made a significant impact on the Bayer drug discovery process:

  • Can analyze data from 100,000 compounds in HCS experiments
  • Helps identify top therapeutic candidates from vast datasets
  • Analyzes jobs run quickly

Overall, this new phenotypic imaging FM accelerates the Bayer drug discovery pipeline.

Looking ahead

As Bayer continues to push the boundaries of ML in drug discovery, they’re exploring various technical processes and mechanisms to do more data science with the same resources. The team has implemented dynamic workload scaling to accommodate growing demand from their 20-person team. The Bayer team are considering implementing better ML experiment tracking with the full managed MLFlows on Amazon SageMaker, and SageMaker training plans to schedule resources efficiently. The team is also exploring various Amazon SageMaker inference options based on requirements to serve their FMs to their digital pathology and histopathology teams.

Conclusion

Through the partnership with AWS, the Bayer Research team has been able to implement AI foundation model training to help accelerate their research findings. Bayer can now analyze data from 100,000 compounds in HCS experiments to identify top therapeutic candidates in a shorter timeframe than traditional solutions.

To learn more about foundation model training on AWS, please contact an AWS Life Sciences representative.

Further reading

Mike Tarselli

Mike Tarselli

Mike Tarselli is a Specialist Leader in Healthcare & Life Sciences Data and AI at AWS. He has spent 25+ years in the biopharma industry. As a leader in AI and data strategy, he works with scientific and technical teams to help them realize their vision, while embracing the fast pace and enormity of AI.

Ankit Anand

Ankit Anand

Ankit Anand is a Senior Foundation Models Go-To-Market (GTM) Specialist at AWS. He partners with top generative AI model builders, strategic customers, and AWS Service Teams to enable the next generation of AI/ML workloads on AWS. Ankit’s experience includes product management expertise within the financial services industry for high-frequency/low-latency trading and business development for Amazon Alexa.

Michael Hsieh

Michael Hsieh

Michael Hsieh is a Principal AI/ML Specialist Solutions Architect. He works with HCLS customers to advance their ML journey with AWS technologies and his expertise in medical imaging. As a Seattle transplant, he loves exploring the great mother nature the city has to offer, such as the hiking trails, scenery kayaking in the SLU, and the sunset at Shilshole Bay.

Stefan Appel

Stefan Appel

Stefan Appel is a Senior Solutions Architect at AWS. For 10+ years, he supports enterprise customers adopt cloud technologies. Before joining AWS, Stefan held positions in software architecture, product management, and IT operations departments. He began his career in research on event-based systems. In his spare time, he enjoys hiking and has walked the length of New Zealand following Te Araroa.