Listing Thumbnail

    DeepSeek R1 Distill Qwen 1.5B

     Info
    Sold by: AUM Labs 
    Deployed on AWS
    Free Trial
    A self-hosted production-ready DeepSeek-R1-Distill-Qwen-1.5B model (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) running seamlessly in your private AWS cloud! With an easy single-click installation, set up all the essential infrastructure in your own cloud environment hassle-free. Plus, you will have quick access to an API endpoint that is ready for your queries and scales automatically based on your needs. Best of all, with the service operating solely in your cloud, your data remains completely secure and confidential, never leaving your private space. Experience peace of mind and unleash the full potential of DeepSeek R1 models today!

    Overview

    This service offers a hosted version of the DeepSeek-R1-Distill-Qwen-1.5B model (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ), which operates within your private cloud. After you subscribe to the listing, a CloudFormation deployment will initiate in your AWS account, setting up an EKS cluster running an inference service for the DeepSeek-R1-Distill-Qwen-1.5B model. Once the installation is complete, an API endpoint will be made available for seamless service queries!

    DeepSeek-R1-Distill-Qwen-1.5B model is fine-tuned based on the open-source Qwen model, using samples generated by DeepSeek-R1. The DeepSeek team showed that the reasoning patterns discovered with reinforcement learning in a giant 671 B model can be compressed into tiny dense models without much loss. This 1.5 B checkpoint is the smallest of those distillations.

    DeepSeek R1-Distill-Qwen-1.5 B punches way above its weight in math- and code-heavy reasoning while still fitting on a single laptop GPU (~4 GB in 8-bit). Use it whenever you need solid chain-of-thought performance under tight VRAM / latency budgets.

    Architecture: 1.78 B-param decoder Transformer (Qwen 2.5-Math-1.5 B base) distilled from the 671 B-param DeepSeek R1 reasoning model

    Context length: 32,768 tokens (inherits Qwen 2.5 long-context support)

    Highlights

    • Privately hosted version of DeepSeek R1 1.5B model based on Qwen running securely on your cloud. Never worry about data leaving your cloud.
    • # Performance of Deepseek-R1-Distill-Qwen-1.5B on various benchmarks: - AIME 2024 pass@1 28.9 - AIME 2024 cons@64 52.7 - MATH-500 pass@1 83.9 - GPAQ Diamond pass@1 33.8 - LiveCodeBench pass@1 16.9 - CodeForces rating 954
    • DeepSeek R1-Distill-Qwen-1.5 B is a rare mix of tiny footprint and serious analytical power. Whenever your problem looks more like an Olympiad question or a LeetCode hard than a casual conversation and you only have laptop-grade hardware, then this is the model to load.

    Details

    Delivery method

    Delivery option
    Install DeepSeek as a Service stack via the CloudFormation template

    Latest version

    Operating system
    AmazonLinux 2023

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Free trial

    Try this product free for 5 days according to the free trial terms set by the vendor. Usage-based pricing is in effect for usage beyond the free trial terms. Your free trial gets automatically converted to a paid subscription when the trial ends, but may be canceled any time before that.

    DeepSeek R1 Distill Qwen 1.5B

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (12)

     Info
    Dimension
    Cost/hour
    g5.8xlarge
    $0.10
    g5.2xlarge
    $0.10
    p5e.48xlarge
    $0.10
    g5.4xlarge
    $0.10
    p5.48xlarge
    $0.10
    p4d.24xlarge
    $0.10
    g5.16xlarge
    $0.10
    g5.24xlarge
    $0.10
    g5.12xlarge
    $0.10
    p5en.48xlarge
    $0.10

    Vendor refund policy

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Install DeepSeek as a Service stack via the CloudFormation template

    Launch a production-ready DeepSeek-as-a-Service in minutes with this turnkey CloudFormation template. It automatically provisions an EKS cluster (with optional VPC creation), GPU-powered node group, secure ACM-validated domain, and a fully-configured Helm deployment of the DeepSeek model—no manual Kubernetes tinkering required. Simply deploy the stack and start serving blazing-fast, scalable generative-AI endpoints from your own AWS account.

    Key capabilities

    • One-click deployment – spin up the entire stack in ~15 minutes without touching kubectl, Helm charts, or ACM.
    • GPU-optimized – node group is pre-sized for latency-critical inference and ships with NVIDIA’s device plugin.
    • Automatic HTTPS – a Lambda workflow requests an ACM certificate, validates DNS, and wires TLS to the load balancer; the issued customer domain is stored in SSM and pushed to your SaaS account via SNS.
    • Bring-your-own or new VPC – the template can search for a compatible private-subnet pair with NAT gateways, or build an isolated /16 VPC complete with public & private subnets, IGWs, NAT gateways, and route tables.
    • Self-cleaning bootstrap – a short-lived EC2 builder instance handles cluster configuration, Helm install, and signals CloudFormation, then terminates itself to avoid idle costs.
    • Full control – after launch you have full control of the cluster just like any EKS environment: scale nodes, roll images, or extend with additional micro-services.

    What the template creates

    1. (Optional) New IPv4 /16 VPC with two AZ-balanced public & private subnets, route tables, IGW, and redundant NAT gateways.
    2. Amazon EKS cluster (v1.32) with dedicated control-plane security group and API server endpoints opened for HTTPS.
    3. GPU node group (A10, A100 or H100 GPUs, AL2_x86_64_GPU AMI, 100 GB EBS).
    4. Amazon ACM certificate for a unique sub-domain (e.g., .aws.aumlabs.ai) and an NLB configured with HTTP → HTTPS redirect and TLS offload.
    5. Helm-based deployment (llm-inference namespace) of the DeepSeek container image.
    6. AWS SSM Parameter /eks//customer-domain containing the final service URL, for easy CI/CD integration.
    7. Helper Lambdas and IAM roles to automate VPC discovery, certificate issuance, secret propagation, and SNS notifications.
    CloudFormation Template (CFT)

    AWS CloudFormation templates are JSON or YAML-formatted text files that simplify provisioning and management on AWS. The templates describe the service or application architecture you want to deploy, and AWS CloudFormation uses those templates to provision and configure the required services (such as Amazon EC2 instances or Amazon RDS DB instances). The deployed application and associated resources are called a "stack."

    Version release notes

    DeepSeek R1 Distill Qwen 1.5B as a service:

    Privately hosted version of DeepSeek R1 Distill Qwen 1.5B (https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B ) running in your cloud! Explore the full potential of DeepSeek R1 models without ever worrying about data leaving your cloud!

    Additional details

    Usage instructions

    Follow the steps below to launch, configure, and run the DeepSeek LLM-Inference stack in your own AWS account.

    1. Prerequisites
    • AWS account with permissions to create VPC, EKS, EC2 (g5/ p4/p5), ACM, IAM, SSM, and SNS resources.
    • Sufficient g5/p4/p5 GPU quota in two AZs of your chosen region. You need a quota of atleast 8 vCPUs to launch the stack on the cheapest available A10 GPU instance (g5.xlarge). You can request the quota increase at https://us-east-1.console.aws.amazon.com/servicequotas/home/services/ec2/quotas/L-DB2E81BA 
    • (Optional) Existing VPC with >= 2 private subnets that have NAT-gateway egress if you plan to reuse your own network. Otherwise, the stack will automatically create a new VPC and subnets in your AWS account.
    1. Subscribe & launch
    • Click Continue to Subscribe on the AWS Marketplace page and accept the terms.
    • Choose Continue to Configuration -> Continue to Launch.
    • Select the CloudFormation delivery option and choose the region in which you want to deploy.
    • Press Launch to open the stack in CloudFormation, then Next to the parameter wizard.
    1. (Optional) Configure stack parameters
    • ClusterName - Friendly name for the EKS cluster (default llm-inference).
    • CreateVPC - true (build a new /16 VPC) or false (use an existing one automatically discovered) (default set to true). Leave the parameters at their defaults unless you have a specific reason to change them. Click Next, add optional tags, then Create stack.
    1. Monitor deployment (~15 - 20 min)
    • In the CloudFormation console watch the Events tab; status will progress through CREATE_IN_PROGRESS -> CREATE_COMPLETE.
    • First, CloudFormation creates an EKS cluster.
    • Next, CloudFormation starts an EC2 instance to setup the EKS cluster. The EC2 instance installs Kubectl, Helm, Git, and the NVIDIA device plugin and sets up the helm chart on the EKS cluster. During this Phase, a GPU node joins the EKS cluster. Once the EKS cluster setup is finished, the EC2 instance self-terminates and signals success.
    • Finally, a validated ACM certificate is issued, allowing the load balancer on the EKS cluster to be accessed via a friendly URL.
    1. Locate your endpoint when the stack deployment is complete:
    • Open the Outputs tab; copy the value of CustomerDomain (e.g., https://<your-unique-identifier>.aws.aumlabs.ai/docs).
    • The same URL is stored in SSM Parameter Store at /eks/<ClusterName>/customer-domain
    1. Test the model Open https://<your-unique-identifier>.aws.aumlabs.ai/docs to access the auto-generated Swagger / OpenAPI UI, or invoke directly:
    curl -X POST https://<your-unique-hash>.aws.aumlabs.ai/generate \ -H "Content-Type: application/json" \ -d '{ "prompts": ["Solve for x: 4x-9= 15 <think>\n"] }'

    The API endpoint also supports batch calls. You can send a comma separated list of prompts.

    curl -X POST https://<your-unique-hash>.aws.aumlabs.ai/generate \ -H "Content-Type: application/json" \ -d '{ "prompts": ["Solve for x: 4x-9= 15 <think>\n", "Find b, where 8^2 + b^2 = 17^2 <think>\n"] }'

    Expect a JSON response containing the model's reply in the following format:

    { "responses": [ "<Reply to Query 1>", "<Reply to Query 2>" ] }
    1. Operate & scale After launch you can manage the cluster just like any EKS environment: scale nodes, or extend with additional micro-services if needed. The service is already production ready so you don't need to make changes unless absolutely required.

    2. Cleanup Delete the CloudFormation stack to remove all resources: EKS, GPU instances, load balancer, VPC (if created), ACM certificate, IAM roles, and SSM parameters. There is no leftover cost once the stack is gone.

    This product does not collect or export customer data to external systems outside of your AWS account.

    Need help or have feature requests? Use the Support tab on the AWS Marketplace listing or email support@aumlabs.ai .

    Support

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 AWS reviews
    No customer reviews yet
    Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.