Listing Thumbnail

    Friendli Container

     Info
    Deployed on AWS
    Run generative AI models on your own AWS cloud with blazing-fast inference, reduced cost, and full data control-powered by FriendliAI's high-performance containerized engine.

    Overview

    Friendli Container delivers FriendliAI's high-performance inference engine as a portable, production-ready Docker container that runs on your infrastructure-whether in Amazon EKS, private cloud, or on-premise environments. It enables enterprises to securely serve generative AI models and custom fine-tunes with up to 3x faster output and 50-90% lower GPU usage compared to open-source stacks. Designed for organizations with strict data security, compliance, and performance requirements, Friendli Container supports full VPC isolation. It integrates natively with Prometheus and Grafana for real-time observability, allowing teams to track throughput, latency, TTFT, and caching efficiency at production scale. Friendli's patented technologies-like Continuous Batching™, Friendli TCache, and optimized quantization-enable extremely low-latency inference without sacrificing model quality. Typical use cases include deploying multimodal generative AI models behind the firewall to meet regulatory or privacy constraints, running RAG-enhanced enterprise search systems, serving fine-tuned models for customer service or agent tasks, and powering AI APIs that demand sub-second response times under fluctuating load. Friendli Container comes with a high-level Kubernetes CRD(Custom Resource Definition), so launching an inference endpoint is as easy as launching a Deployment in Kubernetes. If you need fast, flexible, and secure generative AI inference across text, image, and code without over-provisioning GPUs or stitching together infrastructure, Friendli Container is your drop-in solution for scalable, cost-effective AI deployment.

    Highlights

    • Blazing-Fast Output Speeds: Fastest token generation among GPU-based providers, powered by patented continuous batching and speculative decoding
    • 50-90% GPU Reduction: Serve the same workload with a fraction of the GPU footprint-cutting cost and idle resource waste
    • Integrated Observability: Friendli Container exposes real-time metrics using the Prometheus text-format, unlocking useful insights into latency, throughput, cache hit ratios, and more

    Details

    Delivery method

    Supported services

    Delivery option
    Amazon EKS console add-on

    Latest version

    Operating system
    Linux

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Friendli Container

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (7)

     Info
    Dimension
    Description
    Cost/host/hour
    NVIDIA B200
    GPU hours running NVIDIA B200
    $1.25
    NVIDIA H200
    GPU hours running NVIDIA H200
    $1.00
    NVIDIA H100
    GPU hours running NVIDIA H100
    $1.00
    NVIDIA A100
    GPU hours running NVIDIA A100
    $0.75
    NVIDIA L40S
    GPU hours running NVIDIA L40S
    $0.30
    NVIDIA A10G
    GPU hours running NVIDIA A10G
    $0.20
    NVIDIA L4
    GPU hours running NVIDIA L4
    $0.20

    Vendor refund policy

    We do not support any refunds currently.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Amazon EKS console add-on

    Supported services: Learn more 
    • Amazon EKS
    EKS add-on

    An add-on is software that provides supporting operational capabilities to Kubernetes applications but isn't specific to the application. This includes software like observability agents or Kubernetes drivers that allow the cluster to interact with underlying AWS resources for networking, compute, and storage. Add-on software is typically built and maintained by the Kubernetes community, cloud providers like AWS, or third-party vendors. Amazon EKS add-ons provide installation and management of a curated set of add-ons for Amazon EKS clusters. All Amazon EKS add-ons include the latest security patches and bug fixes, and are validated by AWS to work with Amazon EKS. Amazon EKS add-ons allow you to consistently ensure that your Amazon EKS clusters are secure and stable and reduce the amount of work that you need to do to install, configure, and update add-ons.

    Version release notes

    Added support for Kubernetes 1.33 Added option to retain HuggingFace cache on hostPath Added option to use existing PVC for base model checkpoint Added a convenience method to turn any AWS S3 bucket into a read-only PVC

    Additional details

    Usage instructions

    Step 1: Add a GPU Node Group to your EKS Cluster

    Step 2: Install Friendli Container EKS add-on

    Step 3: Configure IAM roles for service accounts (IRSA) so that the Kubernetes ServiceAccount you want to run your inference pods with can exercise AWSMarketplaceMeteringFullAccess policy on your behalf.

    Step 4 (Optional): To deploy a private or gated model in the HuggingFace model hub, create a Kubernetes Secret with the token with read permission for the model.

    Step 5. Create a FriendliDeployment object on the Kubernetes Cluster with the model you want to run, the name of the Node Group you created in Step 1, and optionally the name of the Secret you created in Step 4.

    Check https://github.com/friendliai/examples/tree/main/aws/eks-addon  for the detailed instructions and examples, or read on.

    Step 1: Add a GPU Node Group to your EKS Cluster

    Open Amazon EKS console and choose the cluster that you want to create a node group in, and select the "Compute" tab and click "Add node group". Friendli Container supports NVIDIA L4, A10G, L40S, A100, H100, H200, and B200 instances.

    Step 2. Install Friendli Container EKS add-on

    Open Amazon EKS console and choose the cluster that you want to configure, and Select the "Add-ons" tab and click "Get more add-ons". Scroll down and under the section "AWS Marketplace add-ons", search and check "Friendli Container".

    We recommend you to consider also installing the following EKS add-ons from AWS: Amazon VPC CNI, CoreDNS, kube-proxy, and Amazon EKS Pod Identity Agent.

    Step 3: Configure IAM roles for service accounts (IRSA)

    This is needed to allow the Kubernetes ServiceAccount to contact AWS Marketplace for license validation. The Kubernetes ServiceAccount you want to run your inference pods with should be able to exercise AWSMarketplaceMeteringFullAccess policy on your behalf.

    First, create an IAM OIDC provider for the clusterUsing eksctl, execute

    eksctl utils associate-iam-oidc-provider --region <REGION> --cluster <CLUSTER> --approve

    The steps are detailed on https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html 

    Second, assign IAM role to the Kubernetes Service Account.

    If you want to use the "default" ServiceAccount under the "default" namespace to run the inference pods, execute

    eksctl create iamserviceaccount --region <REGION> --cluster <CLUSTER> --namespace default --name default --role-name AWSMarketplaceMeteringAccessForFriendliContainer --attach-policy-arn arn:aws:iam::aws:policy/AWSMarketplaceMeteringFullAccess --approve --override-existing-serviceaccounts

    The steps are detailed on https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html 

    Step 4 (Optional): Create a secret with a HuggingFace token.

    To deploy a private or gated model from HuggingFace, you need to create an access token with "read" permission on the model. Assuming you want to create a secret named "hf-secret", execute

    kubectl create secret generic hf-secret --from-literal token=<YOUR_TOKEN_HERE>

    Step 5. Create a FriendliDeployment object on the Kubernetes Cluster.

    Write a FriendliDeployment specification. An example can be found at https://github.com/friendliai/examples/tree/main/aws/eks-addon 

    Then create the object using the kubectl tool: kubectl apply -f friendli-deployment.yaml

    After the deployment is done, you can port-forward to the Kubernetes service created to execute inference requests.

    Support

    Vendor support

    You can reach our support staff at support@friendli.ai 

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 AWS reviews
    No customer reviews yet
    Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.