Friendli Container

Run generative AI models on your own AWS cloud with blazing-fast inference, reduced cost, and full data control-powered by FriendliAI's high-performance containerized engine.

0 AWS reviews

View purchase options

Overview

Friendli Container delivers FriendliAI's high-performance inference engine as a portable, production-ready Docker container that runs on your infrastructure-whether in Amazon EKS, private cloud, or on-premise environments. It enables enterprises to securely serve generative AI models and custom fine-tunes with up to 3x faster output and 50-90% lower GPU usage compared to open-source stacks. Designed for organizations with strict data security, compliance, and performance requirements, Friendli Container supports full VPC isolation. It integrates natively with Prometheus and Grafana for real-time observability, allowing teams to track throughput, latency, TTFT, and caching efficiency at production scale. Friendli's patented technologies-like Continuous Batching™, Friendli TCache, and optimized quantization-enable extremely low-latency inference without sacrificing model quality. Typical use cases include deploying multimodal generative AI models behind the firewall to meet regulatory or privacy constraints, running RAG-enhanced enterprise search systems, serving fine-tuned models for customer service or agent tasks, and powering AI APIs that demand sub-second response times under fluctuating load. Friendli Container comes with a high-level Kubernetes CRD(Custom Resource Definition), so launching an inference endpoint is as easy as launching a Deployment in Kubernetes. If you need fast, flexible, and secure generative AI inference across text, image, and code without over-provisioning GPUs or stitching together infrastructure, Friendli Container is your drop-in solution for scalable, cost-effective AI deployment.

Highlights

Blazing-Fast Output Speeds: Fastest token generation among GPU-based providers, powered by patented continuous batching and speculative decoding
50-90% GPU Reduction: Serve the same workload with a fraction of the GPU footprint-cutting cost and idle resource waste
Integrated Observability: Friendli Container exposes real-time metrics using the Prometheus text-format, unlocking useful insights into latency, throughput, cache hit ratios, and more

Details

Sold by

FriendliAI

Unlock automation with AI agent solutions

Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.

Explore AI agent solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Friendli Container

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (7)

Info

Dimension	Description	Cost/host/hour
NVIDIA B200	GPU hours running NVIDIA B200	$1.25
NVIDIA H200	GPU hours running NVIDIA H200	$1.00
NVIDIA H100	GPU hours running NVIDIA H100	$1.00
NVIDIA A100	GPU hours running NVIDIA A100	$0.75
NVIDIA L40S	GPU hours running NVIDIA L40S	$0.30
NVIDIA A10G	GPU hours running NVIDIA A10G	$0.20
NVIDIA L4	GPU hours running NVIDIA L4	$0.20

Vendor refund policy

We do not support any refunds currently.

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Version

Delivery details

Amazon EKS console add-on

Supported services: Learn more

Amazon EKS

EKS add-on

An add-on is software that provides supporting operational capabilities to Kubernetes applications but isn't specific to the application. This includes software like observability agents or Kubernetes drivers that allow the cluster to interact with underlying AWS resources for networking, compute, and storage. Add-on software is typically built and maintained by the Kubernetes community, cloud providers like AWS, or third-party vendors. Amazon EKS add-ons provide installation and management of a curated set of add-ons for Amazon EKS clusters. All Amazon EKS add-ons include the latest security patches and bug fixes, and are validated by AWS to work with Amazon EKS. Amazon EKS add-ons allow you to consistently ensure that your Amazon EKS clusters are secure and stable and reduce the amount of work that you need to do to install, configure, and update add-ons.

Version release notes

Updated Friendli Container version to v1.11.21 to expand model support and provide better performance on the latest GPU accelerators.

Additional details

Usage instructions

Step 1: Add a GPU Node Group to your EKS Cluster

Step 2: Install Friendli Container EKS add-on

Step 3: Configure IAM roles for service accounts (IRSA) so that the Kubernetes ServiceAccount you want to run your inference pods with can exercise AWSMarketplaceMeteringFullAccess policy on your behalf.

Step 4 (Optional): To deploy a private or gated model in the HuggingFace model hub, create a Kubernetes Secret with the token with read permission for the model.

Step 5. Create a FriendliDeployment object on the Kubernetes Cluster with the model you want to run, the name of the Node Group you created in Step 1, and optionally the name of the Secret you created in Step 4.

Check https://github.com/friendliai/examples/tree/main/aws/eks-addon for the detailed instructions and examples, or read on.

Step 1: Add a GPU Node Group to your EKS Cluster

Open Amazon EKS console and choose the cluster that you want to create a node group in, and select the "Compute" tab and click "Add node group". Friendli Container supports NVIDIA L4, A10G, L40S, A100, H100, H200, and B200 instances.

Step 2. Install Friendli Container EKS add-on

Open Amazon EKS console and choose the cluster that you want to configure, and Select the "Add-ons" tab and click "Get more add-ons". Scroll down and under the section "AWS Marketplace add-ons", search and check "Friendli Container".

We recommend you to consider also installing the following EKS add-ons from AWS: Amazon VPC CNI, CoreDNS, kube-proxy, and Amazon EKS Pod Identity Agent.

Step 3: Configure IAM roles for service accounts (IRSA)

This is needed to allow the Kubernetes ServiceAccount to contact AWS Marketplace for license validation. The Kubernetes ServiceAccount you want to run your inference pods with should be able to exercise AWSMarketplaceMeteringFullAccess policy on your behalf.

First, create an IAM OIDC provider for the clusterUsing eksctl, execute

eksctl utils associate-iam-oidc-provider --region <REGION> --cluster <CLUSTER> --approve

The steps are detailed on https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html

Second, assign IAM role to the Kubernetes Service Account.

If you want to use the "default" ServiceAccount under the "default" namespace to run the inference pods, execute

eksctl create iamserviceaccount --region <REGION> --cluster <CLUSTER> --namespace default --name default --role-name AWSMarketplaceMeteringAccessForFriendliContainer --attach-policy-arn arn:aws:iam::aws:policy/AWSMarketplaceMeteringFullAccess --approve --override-existing-serviceaccounts

The steps are detailed on https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html

Step 4 (Optional): Create a secret with a HuggingFace token.

To deploy a private or gated model from HuggingFace, you need to create an access token with "read" permission on the model. Assuming you want to create a secret named "hf-secret", execute

kubectl create secret generic hf-secret --from-literal token=<YOUR_TOKEN_HERE>

Step 5. Create a FriendliDeployment object on the Kubernetes Cluster.

Write a FriendliDeployment specification. An example can be found at https://github.com/friendliai/examples/tree/main/aws/eks-addon

Then create the object using the kubectl tool: kubectl apply -f friendli-deployment.yaml

After the deployment is done, you can port-forward to the Kubernetes service created to execute inference requests.

Resources

Vendor resources

Guide

EKS Add-on Deployment Examples

Grafana Dashboards

Support

Vendor support

You can reach our support staff at support@friendli.ai

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

Friendli Container (Contract-based pricing)

By FriendliAI

Run generative AI models on your own AWS cloud with blazing-fast inference, reduced cost, and full data control-powered by FriendliAI's high-performance containerized engine.

View product

Llama 3.1 8B Instruct Friendli Container

By FriendliAI

Efficient, fast, and reliable generative AI inference solution of Llama 3.1 8B Instruct Int8 model

View product

Rancher

By ATH Infosystems

This product has charges associated with it for seller support. Rancher is an open-source container management platform that simplifies the deployment and operation of Kubernetes clusters. It provides a user-friendly interface to manage containerized applications, facilitates multi-cluster management, and enhances security and monitoring capabilities within a Kubernetes environment.

View product

Podman on Ubuntu22.04 with maintenance support by Apps4Rent

By Apps4rent LLC

This product has charges associated with it for technical support and maintenance provided by Apps4Rent. The usage charges are USD 0.1/hour and USD 870/yr.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 AWS reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.