Medical LLM - 8B

Medical model exceling at clinical tasks with efficient deployment and cost-effectiveness, ideal for rapid, high-accuracy responses.

0 AWS reviews

View purchase options

Try for free

Overview

This next-generation 8B parameter medical language model preserves the deployment-friendly footprint of our earlier 7B release while introducing a dedicated reasoning mode that can follow multi-step clinical logic and justify its answers. Trained on an expanded, carefully curated corpus of medical literature and reinforced with chain-of-thought supervision, it excels at differential diagnosis, guideline-aware care planning, and complex patient-note summarization. Its smaller size enables faster inference and reduced computational costs, making it ideal for organizations seeking to balance performance with resource optimization. Perfect for high-throughput environments requiring quick responses, this model maintains high accuracy in core medical tasks while consuming significantly less computing power than larger variants. Like its siblings, it's optimized for Retrieval-Augmented Generation (RAG), seamlessly integrating with healthcare databases and EHR systems. Choose this model when rapid response times and cost-effectiveness are priorities, without compromising on essential medical comprehension capabilities.

IMPORTANT USAGE INFORMATION:

After subscribing to this product and creating a SageMaker endpoint, billing occurs on an HOURLY BASIS for as long as the endpoint is running.

-Charges apply even if the endpoint is idle and not actively processing requests.

-To stop charges, you MUST DELETE the endpoint in your SageMaker console.

-Simply stopping requests will NOT stop billing.

This ensures you are only billed for the time you actively use the service.

Highlights

**Real-Time Inference** * Instance Type: **ml.g5.12xlarge** * Maximum Model Length: 32,000 tokens Tokens per Second during real-time inference: * **Text Completion / Summarization**: up to 226 tokens per second * **Text Completion / QA**: up to 834 tokens per second
**Batch Transform** * Instance Type: **ml.g5.12xlarge** * Maximum Model Length: 32,000 tokens Tokens per Second during batch transform operations: * **Text Completion / Summarization**: up to 145 tokens per second * **Text Completion / QA**: up to 693 tokens per second
**Accuracy** * Outperforms Med-PaLM-1 in clinical reasoning (86.81% vs 83.8%) * Achieves 75.30% average across OpenMed benchmarks, comparable to larger models * Superior performance in PubMedQA (76.6%) vs similar-sized models * Matches GPT-4's accuracy in medical QA tasks while being 100x smaller * Ideal for cost-efficient clinical deployments with fast inference

Details

Sold by

John Snow Labs

Unlock automation with AI agent solutions

Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.

Explore AI agent solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Free trial

Try for free

Try this product free for 15 days according to the free trial terms set by the vendor.

Medical LLM - 8B

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (4)

Info

Dimension	Description	Cost/host/hour
ml.g5.12xlarge Inference (Batch) Recommended	Model inference on the ml.g5.12xlarge instance type, batch mode	$9.98
ml.g5.12xlarge Inference (Real-Time) Recommended	Model inference on the ml.g5.12xlarge instance type, real-time mode	$9.98
ml.g4dn.12xlarge Inference (Batch)	Model inference on the ml.g4dn.12xlarge instance type, batch mode	$9.98
ml.g4dn.12xlarge Inference (Real-Time)	Model inference on the ml.g4dn.12xlarge instance type, real-time mode	$9.98

Vendor refund policy

No refunds are possible.

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

Amazon SageMaker model

An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

Deploy the model on Amazon SageMaker AI using the following options:

Real-time inference

Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference .

Batch transform

Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI .

Version release notes

Introducing a dedicated reasoning mode that can follow multi-step clinical logic and justify its answers.

Additional details

Inputs

Summary: Input Format

1. Chat Completion

{
"model": "/opt/ml/model",
"messages": [
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What should I do if I have a fever and body aches?"}
],
"max_tokens": 1024,
"temperature": 0.7
}

For additional parameters see:

ChatCompletionRequest OpenAI's Chat API

2. Text Completion

Single Prompt Example {
"model": "/opt/ml/model",
"prompt": "How can I maintain good kidney health?",
"max_tokens": 512,
"temperature": 0.6
}

Multiple Prompts Example {
"model": "/opt/ml/model",
"prompt": [
"How can I maintain good kidney health?",
"What are the best practices for kidney care?"
],
"max_tokens": 512,
"temperature": 0.6
}

Important Notes:

Streaming Responses: Add "stream": true to your request payload to enable streaming Model Path Requirement: Always set "model": "/opt/ml/model" (SageMaker's fixed model location)

Input MIME type: application/json

Real-time inference sample input data

https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/products/sagemaker/models/JSL-Medical-LLM-8B/inputs/real-time

Batch transform sample input data

https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/products/sagemaker/models/JSL-Medical-LLM-8B/inputs/batch

Resources

Vendor resources

Model documentation

Measuring the Benefits of Healthcare Specific Large Language Models

Support

Vendor support

For any assistance, please reach out to support@johnsnowlabs.com .

Get support

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

Medical Visual LLM - 8B

By John Snow Labs

This 8B parameter vision-language model delivers medical-grade multimodal intelligence in a compact, efficient format. It understands both clinical text and visual content - interpreting X-rays, MRIs, pathology slides, diagrams, and structured documents such as charts, forms, and tables.

View product

Polish LLM Deployment on AWS – Bielik & PLLuM with SageMaker & EC2

By Chaos Gears

Bielik and PLLuM are open-weight large-language models built specifically for Polish and Slavic languages, released under Apache-2.0 so you can run them anywhere, including AWS. Our offer provisions a fully managed, right-sized inference stack in your AWS account - Amazon SageMaker real-time or serverless endpoints or GPU EC2 (Inf2/P5) clusters - fronted by autoscaling and secure networking. Because models stay on dedicated instances, you avoid the quota limits and surprise version changes common with third-party APIs. No weights are fine-tuned on your prompts; data never leaves your VPC, satisfying strict Polish healthcare (RODO) and KNF financial guidelines.

View product

Medical LLM - Medium

By John Snow Labs

Use for chat, RAG, medical summarization, open-book question answering with context of up to 32K tokens.

View product

Medical Text Translation (EN-ES)

By John Snow Labs

Cutting-edge English - Spanish medical translation model.

View product

Medical LLM - Small

By John Snow Labs

Use for tasks like medical summarization or open-book question answering with context of up to 40K tokens.

View product

UiPath Generative AI for Medical Records Summarization

By UiPath Inc.

With our new medical record summarization activity powered by Anthropic's Claude Opus model, UiPath can provide a clinician-level summarization of a medical record organized in easy-to-understand segments. Users can summarize everything from demographics to lab results to surgical histories while having complete control over the type of information summarized.

View product

The Infrastructure for Medical Device Clouds

By BioT Medical

BioT is the infrastructure for medical device clouds - the flexible, open, and secure foundation needed to build your cloud right and deliver more impactful care from the start.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 AWS reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.

Medical LLM - 8B

Overview

Highlights

Details

Unlock automation with AI agent solutions

Features and programs

Financing for AWS Marketplace purchases

Pricing

Free trial

Medical LLM - 8B

Usage costs (4)

Vendor refund policy

How can we make this page better?

Legal

Vendor terms and conditions

Content disclaimer

Usage information

Delivery details

Amazon SageMaker model

Version release notes

Additional details

Inputs

Input Format

Resources

Vendor resources

Support

Vendor support

AWS infrastructure support

Similar products

Customer reviews

Ratings and reviews