AWS for Industries
Simplifying Medical Imaging AI Deployments with NVIDIA NIMs and AWS Services
Most practicing clinicians are not yet fully benefitting from the efficiency and diagnostic advances that medical imaging artificial intelligence (AI) promises. Additionally, many AI scientists and engineers struggle with the practical aspects of incorporating AI inferences in clinical workflows, and providing a consistent end-user experience when scaling to support millions of studies per year.
Still, the clinical practices of radiology and digital pathology are being transformed by AI. To-date, the US Food and Drug Administration (FDA) has approved 950 AI-enabled medical devices and 77% of those are in the radiology and pathology domains. The potential of AI is rapidly expanding as imaging foundation models are unlocking capabilities beyond what was possible with traditional computer vision approaches.
We will demonstrate how to streamline medical imaging AI deployments with NVIDIA NIM inference microservices and managed Amazon Web Services (AWS) including Amazon SageMaker, Amazon Elastic Kubernetes Service (Amazon EKS), and AWS HealthImaging.
NIM is an important new paradigm that provides easy-to-use microservices designed to accelerate the deployment of generative AI models across all industries. This includes medical imaging AI, where NIM (like VISTA-3D, a foundation model from NVIDIA) is transforming the industry using easy-to-deploy containers to accelerate last mile delivery of medical imaging AI applications.
Amazon SageMaker is a machine learning (ML) service, offering managed data processing, model training (including foundation models at scale), hyperparameter tuning, model inference and full MLOps capabilities.
Amazon EKS is a fully managed Kubernetes service running in the cloud and on premises, with integrated tooling through open-source standards.
The VISTA-3D NIM container has also been customized with a connector to AWS HealthImaging, a HIPAA-eligible, highly scalable, and cost-effective cloud service for storing medical imaging data. The integration accelerates medical imaging AI applications with sub-second image retrieval latencies at scale, powered by cloud-native APIs.
Using the solution we’ll describe how AI developers can build scalable and streamlined medical imaging AI applications for practical clinicians to speed up their clinical workflows and improve their productivity. For this solution, our use case will be segmentation of organs in computer tomography (CT) images from the chest.
Solution overview
The NVIDIA VISTA-3D NIM has an encoder-decoder based foundation model, named Versatile Imaging SegmenTation and Annotation model (VISTA-3D), which can be used for zero-shot, or open vocabulary segmentation. VISTA-3D segments over 120 organs and structures in CT scans. It is easy to work with VISTA-3D, because it presents a model inference endpoint through the industry-standard REST APIs. The frontend FastAPI process routes HTTP requests to a backend model inference process, which is hosted on an open source NVIDIA Triton™ Inference Server that deploys and optimizes scalable and low latency AI model inferences on GPUs.
Figure 1. NVIDIA NIM container architecture includes the libraries and tooling for low latency AI model inference
The following architecture in Figure 2 demonstrates how to deploy the VISTA-3D NIM on Amazon SageMaker, integrating with data stored on HealthImaging at scale.
 Figure 2. Architecture diagram running NIM on Amazon SageMaker, integrating with data from AWS HealthImaging
Figure 2. Architecture diagram running NIM on Amazon SageMaker, integrating with data from AWS HealthImaging
The medical images in DICOM format will be staged in Amazon Simple Storage Service (Amazon S3) and imported to HealthImaging. The VISTA-3D NIM container will be downloaded from NVIDIA NGCTM™, NVIDIA’s repository of containers for AI/ML, metaverse and HPC applications. It will then be uploaded to a private repository with Amazon Elastic Container Registry (Amazon ECR), which is required for both SageMaker and Amazon EKS deployments.
SageMaker inference endpoints have built-in high availability, which means the NIM container will be deployed across multiple Availability Zones. You can also choose what type of hosting endpoints to use on SageMaker, like near real-time inference or asynchronous inference. You can also run a Jupyter notebook in SageMaker Studio, which makes it straightforward to deploy and manage the inference endpoint through AWS SDK for Python (Boto3).
It is also possible to deploy NIM containers on AWS with Amazon EKS using the architecture shown in Figure 3.
 Figure 3. Architecture diagram to run the NIM container on Amazon EKS
Figure 3. Architecture diagram to run the NIM container on Amazon EKS
In this architecture, you can deploy the container in private subnets and leverage AWS PrivateLink for secure network traffic. You can also use AWS Identity and Access Management (IAM) for role-based access and permission control. AWS Load Balancer Controller with Helm and Amazon CloudWatch Observability agent have been packaged in this NVIDIA NIM on EKS automated deployment and will be installed at the same time.
Prerequisites
Visit the NVIDIA API Catalog VISTA-3D model page, click on “Build with this NIM” button. If you have not logged in, you will be prompted to enter an email address. You can use a business email address, which provides a 90-day NVIDIA AI Enterprise license, or a personal email address, which will allow you to join through the NVIDIA Developer Program membership.
Once logged in, you can click on the same “Build with this NIM” button, then “Generate API Key” button to get your API Key to download the NIM container, or under any of the code snippet tabs (Shell, Python, or Node), select “Get API Key”. Once you get the API Key, follow the VISTA-3D documentation for detailed instructions to pull and run the VISTA-3D model container.
To deploy the VISTA-3D container on AWS, first create an AWS account. For the Amazon SageMaker deployment, setup a SageMaker notebook instance and download the sample code from this GitHub repo. For the Amazon EKS deployment, first create an EKS cluster using the Data-on-EKS automation script. You can then use AWS CloudShell or a local terminal with the command line tools (for example, kubectl and Helm) to deploy the container to the EKS cluster.
Deployment Walkthrough
Our first step is to build a custom container from the NIM base image provided by NVIDIA. You can build the image by using the Docker file from the GitHub repo listed in the prerequisites section. A Linux x86 environment is required to build the image. After that, create a private repository in Amazon ECR and push the container image to it. The customized container has a connector layer that can take medical images as input from either HealthImaging or Amazon S3.
When using Amazon S3 as the image source, the custom container layer will download the NIFTI or DICOM files from Amazon S3 using the Boto3 Python library. HealthImaging only supports DICOM images and they are stored in datastores, so you need to provide a DatastoreId and an ImagesetId (equivalent to bucket name and object name in Amazon S3).
When using HealthImaging, a single DICOM instance can be retrieved using GetDICOMInstance API action that is converted to a NIFTI format using SimpleITK. For multi-frame images, the container will download all of the pixel frames for a given ImageSet on HealthImaging, decode them using nvJPEG2000 using a GPU and convert the numpy arrays into nifty files using CuPy and SimpleITK.
You can post the requests to the NIM endpoints using the following example URIs:
- For Amazon S3: s3://<s3bucket>/example-1.nii.gz
- For HealthImaging DICOMweb API: https://dicom-medical-imaging.us-east-1.amazonaws.com/datastore/<datastoreId> /studies/<StudyUID>/series/<SeriesUID>/instances/<InstanceUID>?imageSetId=<imagesetId>
- For HealthImaging GetImageFrame API: healthimaging://<datastoreId>/<imagesetId>
Once you have this customized container in Amazon ECR, you can deploy it on either SageMaker or Amazon EKS. To deploy on a SageMaker managed inference endpoint, this customized container listens on port 8080 and accept POST requests to the /invocations path. The SageMaker inference endpoints are managed with automatic health checks, load balancing and autoscaling setup. With the pre-built Helm chart, you can also deploy this customized NIM container on Amazon EKS, and monitor the deployment using Amazon CloudWatch.
1. Amazon SageMaker Deployment Walkthrough
Using Amazon SageMaker, you can deploy different types of highly available and monitored inference endpoints to consume: near real-time endpoint, asynchronous endpoint for micro-batch inference and large batch transformation jobs. You can use Python SDK Boto3 on a Jupyter notebook, to create a SageMaker near real-time inference endpoint:
Or you can add AsyncInferenceConfig in endpoint configuration to create an async inference endpoint:
If you select an asynchronous endpoint, you can autoscale the compute capability down to instances during times of low usage. This helps avoid paying for idle instances, and reduces your costs automatically. You can do this by defining an autoscale policy that permits scaling the SageMaker inference endpoint to zero instances, as follows:
You will also need to add a policy to start a new instance for the inference endpoint if there are new requests in the queue:
2. Amazon EKS Deployment Walkthrough
We will next walk through deploying the VISTA-3D NIM using Amazon EKS. First clone the Data-on-EKS repo and go to this ai/ml folder with the installation script. Before deployment, use your preferred code editor to change the instance size in eks.tf file for hosting NIM containers to a smaller instance, like a g5.xlarge. Also, change the Amazon EKS cluster name to vistanim-on-eks in the variables.tf file. Change the AWS Region in the same file to where you want to host the inference endpoint, for example, us-east-1.
After you have made these changes, you can deploy the stack by running ./install.sh. When that finishes successfully, configure the kubectl:
aws eks update-kubeconfig --name vistanim-on-eks --region <region>
Now you can use the helm chart created by NVIDIA to deploy the NIM container on this Amazon EKS cluster. Clone the NVIDIA nim-deploy repo and move the helm folder into current folder. Edit the VISTA-3D NIM configuration file to replace the container image repository and tag with the one in your private Amazon ECR. Then deploy the VISTA-3D NIM using the nim-deploy helm chart:
To check your pods to validate they are up and healthy (Pods should be in a Running state and 1/1 Ready status): kubectl get pods -n vista -o wide
You should see pods in a running state, like in Figure 4.
Figure 4. Screenshot of the terminal running kubectl command line showing running pods inside an EKS cluster
Now you can setup an Application Load Balancer controller ingress to allow traffic to the NIM inference endpoints, by deploying the Application Load Balancer ingress from the ingress.yaml configuration file to expose the VISTA-3D NIM: kubectl apply -f eks/ingress.yaml
You can check the public address of your Application Load Balancer generated from the Application Load Balancer controller: kubectl get ing -n vista -o wide
With the public Application Load Balancer domain address you can now make requests to your VISTA-3D image endpoint:
Using this automated Amazon EKS deployment, you will get container insights for observability out of box, as shown in Figure 5.
 Figure 5 – Screenshot of AWS CloudWatch for Container Insights.
Figure 5 – Screenshot of AWS CloudWatch for Container Insights.
Once you are done with all experiments, run the following script to delete the Amazon EKS cluster: ./cleanup.sh
Conclusion
We walked through two ways to deploy the medical imaging NIM from NVIDIA using managed services like Amazon SageMaker, Amazon EKS and HealthImaging. By taking advantage of the automated deployment and built-in features for high availability and observability, you can have a scalable, production ready medical imaging AI system available to be integrated into your medical imaging workflows.
We’d like to acknowledge Ahmed Harouni, who is a Technical Marketing Engineer at Nvidia specializing in deep learning for medical imaging, for his contribution to the creation of this blog’s content.
Contact an AWS Representative to know how we can help accelerate your business.
Further Reading or Actions
- Read more solutions and case studies about AWS for medical imaging
- Follow AWS Well-Architected Framework for Healthcare Industry Lens
- To learn more on how to deploy Generative AI faster with NVIDIA NIM on AWS
- Follow instructions in this GitHub repo if you want to deploy an OHIF image viewer working with AWS HealthImaging.
- Explore this GitHub repo to find more open-source solutions working with AWS HealthImaging.
- This GitHub repo shows how to use the MONAI library to fine tune a VISTA model.

