
Overview
Imagine there is a powerful racecar (a generative AI model) that needs much maintenance and tuning (infrastructure and technical know-how). Friendli Container in SageMaker is like a rental service, taking care of the hassle so you can just drive! It provides a simple interface that connects you to Friendli Engine, a high-performance, cost-effective inference serving engine optimized for generative AI models.
This product is built with Llama. Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. The license is available at: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSEÂ
Highlights
- Access popular open-source models: Get started with pre-loaded models(Llama 3.1 8B Instruct). No need to worry about downloading or optimizing them.
- Build your own workflows: Integrate these models into your applications with just a few lines of code. Generate creative text formats, code, musical pieces, email, letters, etc. and create stunning images with ease.
- Focus on what matters: Forget about infrastructure setup and GPU optimization. Friendli Container handles the heavy lifting, freeing you to focus on your creative vision and application development.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.xlarge Inference (Batch) Recommended | Model inference on the ml.g5.xlarge instance type, batch mode | $0.15 |
ml.g5.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.xlarge instance type, real-time mode | $0.15 |
ml.g5.8xlarge Inference (Real-Time) | Model inference on the ml.g5.8xlarge instance type, real-time mode | $0.15 |
ml.g5.2xlarge Inference (Real-Time) | Model inference on the ml.g5.2xlarge instance type, real-time mode | $0.15 |
ml.g5.4xlarge Inference (Real-Time) | Model inference on the ml.g5.4xlarge instance type, real-time mode | $0.15 |
ml.g5.16xlarge Inference (Real-Time) | Model inference on the ml.g5.16xlarge instance type, real-time mode | $0.15 |
Vendor refund policy
We do not support any refunds currently.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Initial version: Llama-3.1-8B-Instruct Int8
Additional details
Inputs
- Summary
The input request payloads that are compatible with OpenAI's Chat Completion endpoint.
- Limitations for input type
- For input and generated output tokens, a maximum context length is 128k
- Input MIME type
- text/csv, application/json, application/jsonlines
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
messages | A list of messages comprising the conversation so far.
Array[role, content, name]
role(string): The role of the messages author. Possible values: [system, user]
content(string): The content of message.
name(string): The name for the participant to distinguish between participants with the same role. | Type: FreeText
Limitations: Role must be the one of [system, user] | Yes |
frequency_penalty | Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled, taking into account their frequency in the preceding text. This penalization diminishes the model's tendency to reproduce identical lines verbatim. | Default value: null
Type: Continuous
Minimum: -2.0
Maximum: 2.0 | No |
presence_penalty | Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled at least once in the existing text. | Default value: null
Type: Continuous
Minimum: -2.0
Maximum: 2.0 | No |
repetition_penalty | Penalizes tokens that have already appeared in the generated result (plus the input tokens). should be greater than or equal to 1.0. 1.0 means no penalty. This is similar to Hugging Face transformer's repetition_penalty argument. | Default value: null
Type: Continuous
Minimum: 1.0 | No |
max_tokens | The maximum number of tokens to generate. The length of your input tokens plus max_tokens should not exceed the model's maximum length. This is similar to Hugging Face transformer's max_new_tokens argument. | Default value: null
Type: Integer | No |
n | The number of independently generated results for the prompt. Not supported when using beam search. Defaults to 1. This is similar to Hugging Face transformer's num_return_sequences argument. | Default value: 1
Type: Integer
Minimum: 1 | No |
stop | When one of the stop phrases appears in the generation result, the API will stop generation. The stop phrases are excluded from the result. | Default value: null
Type: FreeText
Limitations: string list | No |
temperature | Sampling temperature. Smaller temperature makes the generation result closer to greedy, argmax (i.e., top_k = 1) sampling. defaults to 1.0. This is similar to Hugging Face transformer's temperature argument. | Default value: 1.0
Type: Continuous | No |
top_p | Tokens comprising the top top_p probability mass are kept for sampling. Numbers between 0.0 (exclusive) and 1.0 (inclusive) are allowed. Defaults to 1.0. This is similar to Hugging Face transformer's top_p argument. | Default value: 1.0
Type: Continuous | No |
top_k | The number of highest probability tokens to keep for sampling. Numbers between 0 and the vocab size of the model (both inclusive) are allowed. The default value is 0, which means that the API does not apply top-k filtering. This is similar to Hugging Face transformer's top_k argument. | Default value: null
Type: Integer
Minimum: 1 | No |
Resources
Vendor resources
Support
Vendor support
FriendliAI has been developing key technologies to enable serving generative AI models. Friendli, our flagship generative AI serving engine is available to use.
If you have any problems with the form, please contact us here.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products


