Overview
Friendli Container delivers FriendliAI's high-performance inference engine as a portable, production-ready Docker container that runs on your infrastructure-whether in Amazon EKS, private cloud, or on-premise environments. It enables enterprises to securely serve generative AI models and custom fine-tunes with up to 3x faster output and 50-90% lower GPU usage compared to open-source stacks. Designed for organizations with strict data security, compliance, and performance requirements, Friendli Container supports full VPC isolation. It integrates natively with Prometheus and Grafana for real-time observability, allowing teams to track throughput, latency, TTFT, and caching efficiency at production scale. Friendli's patented technologies-like Continuous Batching™, Friendli TCache, and optimized quantization-enable extremely low-latency inference without sacrificing model quality. Typical use cases include deploying multimodal generative AI models behind the firewall to meet regulatory or privacy constraints, running RAG-enhanced enterprise search systems, serving fine-tuned models for customer service or agent tasks, and powering AI APIs that demand sub-second response times under fluctuating load. Friendli Container comes with a high-level Kubernetes CRD(Custom Resource Definition), so launching an inference endpoint is as easy as launching a Deployment in Kubernetes. If you need fast, flexible, and secure generative AI inference across text, image, and code without over-provisioning GPUs or stitching together infrastructure, Friendli Container is your drop-in solution for scalable, cost-effective AI deployment.
Highlights
- Blazing-Fast Output Speeds: Fastest token generation among GPU-based providers, powered by patented continuous batching and speculative decoding
- 50-90% GPU Reduction: Serve the same workload with a fraction of the GPU footprint-cutting cost and idle resource waste
- Integrated Observability: Friendli Container exposes real-time metrics using the Prometheus text-format, unlocking useful insights into latency, throughput, cache hit ratios, and more
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/host/hour |
---|---|---|
NVIDIA B200 | GPU hours running NVIDIA B200 | $1.25 |
NVIDIA H200 | GPU hours running NVIDIA H200 | $1.00 |
NVIDIA H100 | GPU hours running NVIDIA H100 | $1.00 |
NVIDIA A100 | GPU hours running NVIDIA A100 | $0.75 |
NVIDIA L40S | GPU hours running NVIDIA L40S | $0.30 |
NVIDIA A10G | GPU hours running NVIDIA A10G | $0.20 |
NVIDIA L4 | GPU hours running NVIDIA L4 | $0.20 |
Vendor refund policy
We do not support any refunds currently.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon EKS console add-on
- Amazon EKS
EKS add-on
An add-on is software that provides supporting operational capabilities to Kubernetes applications but isn't specific to the application. This includes software like observability agents or Kubernetes drivers that allow the cluster to interact with underlying AWS resources for networking, compute, and storage. Add-on software is typically built and maintained by the Kubernetes community, cloud providers like AWS, or third-party vendors. Amazon EKS add-ons provide installation and management of a curated set of add-ons for Amazon EKS clusters. All Amazon EKS add-ons include the latest security patches and bug fixes, and are validated by AWS to work with Amazon EKS. Amazon EKS add-ons allow you to consistently ensure that your Amazon EKS clusters are secure and stable and reduce the amount of work that you need to do to install, configure, and update add-ons.
Version release notes
Added support for Kubernetes 1.33 Added option to retain HuggingFace cache on hostPath Added option to use existing PVC for base model checkpoint Added a convenience method to turn any AWS S3 bucket into a read-only PVC
Additional details
Usage instructions
Step 1: Add a GPU Node Group to your EKS Cluster
Step 2: Install Friendli Container EKS add-on
Step 3: Configure IAM roles for service accounts (IRSA) so that the Kubernetes ServiceAccount you want to run your inference pods with can exercise AWSMarketplaceMeteringFullAccess policy on your behalf.
Step 4 (Optional): To deploy a private or gated model in the HuggingFace model hub, create a Kubernetes Secret with the token with read permission for the model.
Step 5. Create a FriendliDeployment object on the Kubernetes Cluster with the model you want to run, the name of the Node Group you created in Step 1, and optionally the name of the Secret you created in Step 4.
Check https://github.com/friendliai/examples/tree/main/aws/eks-addon for the detailed instructions and examples, or read on.
Step 1: Add a GPU Node Group to your EKS Cluster
Open Amazon EKS console and choose the cluster that you want to create a node group in, and select the "Compute" tab and click "Add node group". Friendli Container supports NVIDIA L4, A10G, L40S, A100, H100, H200, and B200 instances.
Step 2. Install Friendli Container EKS add-on
Open Amazon EKS console and choose the cluster that you want to configure, and Select the "Add-ons" tab and click "Get more add-ons". Scroll down and under the section "AWS Marketplace add-ons", search and check "Friendli Container".
We recommend you to consider also installing the following EKS add-ons from AWS: Amazon VPC CNI, CoreDNS, kube-proxy, and Amazon EKS Pod Identity Agent.
Step 3: Configure IAM roles for service accounts (IRSA)
This is needed to allow the Kubernetes ServiceAccount to contact AWS Marketplace for license validation. The Kubernetes ServiceAccount you want to run your inference pods with should be able to exercise AWSMarketplaceMeteringFullAccess policy on your behalf.
First, create an IAM OIDC provider for the clusterUsing eksctl, execute
eksctl utils associate-iam-oidc-provider --region <REGION> --cluster <CLUSTER> --approve
The steps are detailed on https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.htmlÂ
Second, assign IAM role to the Kubernetes Service Account.
If you want to use the "default" ServiceAccount under the "default" namespace to run the inference pods, execute
eksctl create iamserviceaccount --region <REGION> --cluster <CLUSTER> --namespace default --name default --role-name AWSMarketplaceMeteringAccessForFriendliContainer --attach-policy-arn arn:aws:iam::aws:policy/AWSMarketplaceMeteringFullAccess --approve --override-existing-serviceaccounts
The steps are detailed on https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.htmlÂ
Step 4 (Optional): Create a secret with a HuggingFace token.
To deploy a private or gated model from HuggingFace, you need to create an access token with "read" permission on the model. Assuming you want to create a secret named "hf-secret", execute
kubectl create secret generic hf-secret --from-literal token=<YOUR_TOKEN_HERE>
Step 5. Create a FriendliDeployment object on the Kubernetes Cluster.
Write a FriendliDeployment specification. An example can be found at https://github.com/friendliai/examples/tree/main/aws/eks-addonÂ
Then create the object using the kubectl tool: kubectl apply -f friendli-deployment.yaml
After the deployment is done, you can port-forward to the Kubernetes service created to execute inference requests.
Resources
Vendor resources
Support
Vendor support
You can reach our support staff at support@friendli.aiÂ
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
