Overview
This service offers a hosted version of the DeepSeek-R1-Distill-Qwen-1.5B model (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BÂ ), which operates within your private cloud. After you subscribe to the listing, a CloudFormation deployment will initiate in your AWS account, setting up an EKS cluster running an inference service for the DeepSeek-R1-Distill-Qwen-1.5B model. Once the installation is complete, an API endpoint will be made available for seamless service queries!
DeepSeek-R1-Distill-Qwen-1.5B model is fine-tuned based on the open-source Qwen model, using samples generated by DeepSeek-R1. The DeepSeek team showed that the reasoning patterns discovered with reinforcement learning in a giant 671 B model can be compressed into tiny dense models without much loss. This 1.5 B checkpoint is the smallest of those distillations.
DeepSeek R1-Distill-Qwen-1.5 B punches way above its weight in math- and code-heavy reasoning while still fitting on a single laptop GPU (~4 GB in 8-bit). Use it whenever you need solid chain-of-thought performance under tight VRAM / latency budgets.
Architecture: 1.78 B-param decoder Transformer (Qwen 2.5-Math-1.5 B base) distilled from the 671 B-param DeepSeek R1 reasoning model
Context length: 32,768 tokens (inherits Qwen 2.5 long-context support)
Highlights
- Privately hosted version of DeepSeek R1 1.5B model based on Qwen running securely on your cloud. Never worry about data leaving your cloud.
- # Performance of Deepseek-R1-Distill-Qwen-1.5B on various benchmarks: - AIME 2024 pass@1 28.9 - AIME 2024 cons@64 52.7 - MATH-500 pass@1 83.9 - GPAQ Diamond pass@1 33.8 - LiveCodeBench pass@1 16.9 - CodeForces rating 954
- DeepSeek R1-Distill-Qwen-1.5 B is a rare mix of tiny footprint and serious analytical power. Whenever your problem looks more like an Olympiad question or a LeetCode hard than a casual conversation and you only have laptop-grade hardware, then this is the model to load.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Cost/hour |
---|---|
g5.8xlarge | $0.10 |
g5.2xlarge | $0.10 |
p5e.48xlarge | $0.10 |
g5.4xlarge | $0.10 |
p5.48xlarge | $0.10 |
p4d.24xlarge | $0.10 |
g5.16xlarge | $0.10 |
g5.24xlarge | $0.10 |
g5.12xlarge | $0.10 |
p5en.48xlarge | $0.10 |
Vendor refund policy
contact support@aumlabs.aiÂ
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Install DeepSeek as a Service stack via the CloudFormation template
Launch a production-ready DeepSeek-as-a-Service in minutes with this turnkey CloudFormation template. It automatically provisions an EKS cluster (with optional VPC creation), GPU-powered node group, secure ACM-validated domain, and a fully-configured Helm deployment of the DeepSeek model—no manual Kubernetes tinkering required. Simply deploy the stack and start serving blazing-fast, scalable generative-AI endpoints from your own AWS account.
Key capabilities
- One-click deployment – spin up the entire stack in ~15 minutes without touching kubectl, Helm charts, or ACM.
- GPU-optimized – node group is pre-sized for latency-critical inference and ships with NVIDIA’s device plugin.
- Automatic HTTPS – a Lambda workflow requests an ACM certificate, validates DNS, and wires TLS to the load balancer; the issued customer domain is stored in SSM and pushed to your SaaS account via SNS.
- Bring-your-own or new VPC – the template can search for a compatible private-subnet pair with NAT gateways, or build an isolated /16 VPC complete with public & private subnets, IGWs, NAT gateways, and route tables.
- Self-cleaning bootstrap – a short-lived EC2 builder instance handles cluster configuration, Helm install, and signals CloudFormation, then terminates itself to avoid idle costs.
- Full control – after launch you have full control of the cluster just like any EKS environment: scale nodes, roll images, or extend with additional micro-services.
What the template creates
- (Optional) New IPv4 /16 VPC with two AZ-balanced public & private subnets, route tables, IGW, and redundant NAT gateways.
- Amazon EKS cluster (v1.32) with dedicated control-plane security group and API server endpoints opened for HTTPS.
- GPU node group (A10, A100 or H100 GPUs, AL2_x86_64_GPU AMI, 100 GB EBS).
- Amazon ACM certificate for a unique sub-domain (e.g., .aws.aumlabs.ai) and an NLB configured with HTTP → HTTPS redirect and TLS offload.
- Helm-based deployment (llm-inference namespace) of the DeepSeek container image.
- AWS SSM Parameter /eks//customer-domain containing the final service URL, for easy CI/CD integration.
- Helper Lambdas and IAM roles to automate VPC discovery, certificate issuance, secret propagation, and SNS notifications.
CloudFormation Template (CFT)
AWS CloudFormation templates are JSON or YAML-formatted text files that simplify provisioning and management on AWS. The templates describe the service or application architecture you want to deploy, and AWS CloudFormation uses those templates to provision and configure the required services (such as Amazon EC2 instances or Amazon RDS DB instances). The deployed application and associated resources are called a "stack."
Version release notes
DeepSeek R1 Distill Qwen 1.5B as a service:
Privately hosted version of DeepSeek R1 Distill Qwen 1.5B (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5BÂ ) running in your cloud! Explore the full potential of DeepSeek R1 models without ever worrying about data leaving your cloud!
Additional details
Usage instructions
Follow the steps below to launch, configure, and run the DeepSeek LLM-Inference stack in your own AWS account.
- Prerequisites
- AWS account with permissions to create VPC, EKS, EC2 (g5/ p4/p5), ACM, IAM, SSM, and SNS resources.
- Sufficient g5/p4/p5 GPU quota in two AZs of your chosen region. You need a quota of atleast 8 vCPUs to launch the stack on the cheapest available A10 GPU instance (g5.xlarge). You can request the quota increase at https://us-east-1.console.aws.amazon.com/servicequotas/home/services/ec2/quotas/L-DB2E81BAÂ
- (Optional) Existing VPC with >= 2 private subnets that have NAT-gateway egress if you plan to reuse your own network. Otherwise, the stack will automatically create a new VPC and subnets in your AWS account.
- Subscribe & launch
- Click Continue to Subscribe on the AWS Marketplace page and accept the terms.
- Choose Continue to Configuration -> Continue to Launch.
- Select the CloudFormation delivery option and choose the region in which you want to deploy.
- Press Launch to open the stack in CloudFormation, then Next to the parameter wizard.
- (Optional) Configure stack parameters
- ClusterName - Friendly name for the EKS cluster (default llm-inference).
- CreateVPC - true (build a new /16 VPC) or false (use an existing one automatically discovered) (default set to true). Leave the parameters at their defaults unless you have a specific reason to change them. Click Next, add optional tags, then Create stack.
- Monitor deployment (~15 - 20 min)
- In the CloudFormation console watch the Events tab; status will progress through CREATE_IN_PROGRESS -> CREATE_COMPLETE.
- First, CloudFormation creates an EKS cluster.
- Next, CloudFormation starts an EC2 instance to setup the EKS cluster. The EC2 instance installs Kubectl, Helm, Git, and the NVIDIA device plugin and sets up the helm chart on the EKS cluster. During this Phase, a GPU node joins the EKS cluster. Once the EKS cluster setup is finished, the EC2 instance self-terminates and signals success.
- Finally, a validated ACM certificate is issued, allowing the load balancer on the EKS cluster to be accessed via a friendly URL.
- Locate your endpoint when the stack deployment is complete:
- Open the Outputs tab; copy the value of CustomerDomain (e.g., https://<your-unique-identifier>.aws.aumlabs.ai/docs).
- The same URL is stored in SSM Parameter Store at /eks/<ClusterName>/customer-domain
- Test the model Open https://<your-unique-identifier>.aws.aumlabs.ai/docs to access the auto-generated Swagger / OpenAPI UI, or invoke directly:
The API endpoint also supports batch calls. You can send a comma separated list of prompts.
curl -X POST https://<your-unique-hash>.aws.aumlabs.ai/generate \ -H "Content-Type: application/json" \ -d '{ "prompts": ["Solve for x: 4x-9= 15 <think>\n", "Find b, where 8^2 + b^2 = 17^2 <think>\n"] }'Expect a JSON response containing the model's reply in the following format:
{ "responses": [ "<Reply to Query 1>", "<Reply to Query 2>" ] }-
Operate & scale After launch you can manage the cluster just like any EKS environment: scale nodes, or extend with additional micro-services if needed. The service is already production ready so you don't need to make changes unless absolutely required.
-
Cleanup Delete the CloudFormation stack to remove all resources: EKS, GPU instances, load balancer, VPC (if created), ACM certificate, IAM roles, and SSM parameters. There is no leftover cost once the stack is gone.
This product does not collect or export customer data to external systems outside of your AWS account.
Need help or have feature requests? Use the Support tab on the AWS Marketplace listing or email support@aumlabs.ai .
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.