Llama 3.1 8B Instruct Friendli Container

Efficient, fast, and reliable generative AI inference solution of Llama 3.1 8B Instruct Int8 model

Overview

Imagine there is a powerful racecar (a generative AI model) that needs much maintenance and tuning (infrastructure and technical know-how). Friendli Container in SageMaker is like a rental service, taking care of the hassle so you can just drive! It provides a simple interface that connects you to Friendli Engine, a high-performance, cost-effective inference serving engine optimized for generative AI models.

This product is built with Llama. Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved. The license is available at: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE

Highlights

Access popular open-source models: Get started with pre-loaded models(Llama 3.1 8B Instruct). No need to worry about downloading or optimizing them.
Build your own workflows: Integrate these models into your applications with just a few lines of code. Generate creative text formats, code, musical pieces, email, letters, etc. and create stunning images with ease.
Focus on what matters: Forget about infrastructure setup and GPU optimization. Friendli Container handles the heavy lifting, freeing you to focus on your creative vision and application development.

Details

Sold by

FriendliAI

Unlock automation with AI agent solutions

Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.

Explore AI agent solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Free trial

Try for free

Try this product free for 7 days according to the free trial terms set by the vendor.

Llama 3.1 8B Instruct Friendli Container

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (6)

Info

Dimension	Description	Cost/host/hour
ml.g5.xlarge Inference (Batch) Recommended	Model inference on the ml.g5.xlarge instance type, batch mode	$0.15
ml.g5.xlarge Inference (Real-Time) Recommended	Model inference on the ml.g5.xlarge instance type, real-time mode	$0.15
ml.g5.8xlarge Inference (Real-Time)	Model inference on the ml.g5.8xlarge instance type, real-time mode	$0.15
ml.g5.2xlarge Inference (Real-Time)	Model inference on the ml.g5.2xlarge instance type, real-time mode	$0.15
ml.g5.4xlarge Inference (Real-Time)	Model inference on the ml.g5.4xlarge instance type, real-time mode	$0.15
ml.g5.16xlarge Inference (Real-Time)	Model inference on the ml.g5.16xlarge instance type, real-time mode	$0.15

Vendor refund policy

We do not support any refunds currently.

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

Amazon SageMaker model

An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

Deploy the model on Amazon SageMaker AI using the following options:

Real-time inference

Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference .

Batch transform

Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI .

Version release notes

Initial version: Llama-3.1-8B-Instruct Int8

Additional details

Inputs

Summary: The input request payloads that are compatible with OpenAI's Chat Completion endpoint.

Limitations for input type: For input and generated output tokens, a maximum context length is 128k

Input MIME type: text/csv, application/json, application/jsonlines

Real-time inference sample input data

{ "messages": [ { "role": "system", "content": "You are a friendly AI assistant." }, { "role": "user", "content": "Please explain Python language." } ], "temperature": 0.7 }

Batch transform sample input data

https://github.com/friendliai/examples/tree/main/aws/sagemaker/input.json

Input data descriptions

The following table describes supported input data fields for real-time inference and batch transform.

Field name	Description	Constraints	Required
messages	A list of messages comprising the conversation so far. Array[role, content, name] role(string): The role of the messages author. Possible values: [system, user] content(string): The content of message. name(string): The name for the participant to distinguish between participants with the same role.	Type: FreeText Limitations: Role must be the one of [system, user]	Yes
frequency_penalty	Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled, taking into account their frequency in the preceding text. This penalization diminishes the model's tendency to reproduce identical lines verbatim.	Default value: null Type: Continuous Minimum: -2.0 Maximum: 2.0	No
presence_penalty	Number between -2.0 and 2.0. Positive values penalizes tokens that have been sampled at least once in the existing text.	Default value: null Type: Continuous Minimum: -2.0 Maximum: 2.0	No
repetition_penalty	Penalizes tokens that have already appeared in the generated result (plus the input tokens). should be greater than or equal to 1.0. 1.0 means no penalty. This is similar to Hugging Face transformer's repetition_penalty argument.	Default value: null Type: Continuous Minimum: 1.0	No
max_tokens	The maximum number of tokens to generate. The length of your input tokens plus max_tokens should not exceed the model's maximum length. This is similar to Hugging Face transformer's max_new_tokens argument.	Default value: null Type: Integer	No
n	The number of independently generated results for the prompt. Not supported when using beam search. Defaults to 1. This is similar to Hugging Face transformer's num_return_sequences argument.	Default value: 1 Type: Integer Minimum: 1	No
stop	When one of the stop phrases appears in the generation result, the API will stop generation. The stop phrases are excluded from the result.	Default value: null Type: FreeText Limitations: string list	No
temperature	Sampling temperature. Smaller temperature makes the generation result closer to greedy, argmax (i.e., top_k = 1) sampling. defaults to 1.0. This is similar to Hugging Face transformer's temperature argument.	Default value: 1.0 Type: Continuous	No
top_p	Tokens comprising the top top_p probability mass are kept for sampling. Numbers between 0.0 (exclusive) and 1.0 (inclusive) are allowed. Defaults to 1.0. This is similar to Hugging Face transformer's top_p argument.	Default value: 1.0 Type: Continuous	No
top_k	The number of highest probability tokens to keep for sampling. Numbers between 0 and the vocab size of the model (both inclusive) are allowed. The default value is 0, which means that the API does not apply top-k filtering. This is similar to Hugging Face transformer's top_k argument.	Default value: null Type: Integer Minimum: 1	No

Resources

Vendor resources

Visit FriendliAI website

Support

Vendor support

FriendliAI has been developing key technologies to enable serving generative AI models. Friendli, our flagship generative AI serving engine is available to use.

If you have any problems with the form, please contact us here.

Get support

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

GPU Supported DeepSeek & Llama powered All-in-One LLM Suite

By Techlatest.net

This product has charges associated with it for seller support. Run & Manage latest LLMs locally, privately, securely and cost-effectively without any vendor lock-in. This VM solution comes with GPU support , pre-loaded with LLaMA, Mistral, Gemma, DeepSeek, & Qwen models along with Open-WebUI as an intuitive UI to interact with the LLMs and Ollama to install new models as needed.

View product

DeepSeek & Llama powered All-in-One LLM Suite

By Techlatest.net

This product has charges associated with it for seller support. Run & Manage latest LLMs locally, privately, securely and cost-effectively without any vendor lock-in. This VM solution comes pre-loaded with LLaMA, Mistral, Gemma, DeepSeek, & Qwen models along with Open-WebUI as an intuitive UI to interact with the LLMs and Ollama to install new models as needed.

View product

LLaMa 3 Meta AI 70B: OpenAI API Compatible AMI

By Meetrix.io

This is an OpenAI API compatible repackaged open source product of all new LLaMa 3 Meta AI 70B with optional support from Meetrix.io. With the SSL auto generation and preconfigured OpenAI API, the LLaMa 3 70B AMI is the perfect alternative for costly solutions such as GPT-4. Keep costs low with pay-as-you-go pricing, while gaining access to expert assistance.

View product

LLaMa 2 Meta AI 7B: OpenAI API Compatible AMI

By Meetrix.io

This is an OpenAI API compatible single-click deployment AMI package of LLaMa 2 Meta AI 7B which is tailored for the 7 billion parameter pretrained generative text model. This Amazon Machine Image is easily deployable without devops hassle and fully optimized for developers eager to harness the power of advanced text generation capabilities. With the SSL auto generation and preconfigured OpenAI API, the LLaMa 2 7B AMI is the perfect alternative for costly solutions such as ChatGPT.

View product

Code Llama 34B Instruct v1.0.0: An Advanced AI Tool for Coding

By Meetrix.io

This is a single-click deployment AMI for Code Llama 34B Instruct, which is an AI model built on top of Llama 2, fine-tuned for generating and discussing code. It is a large language model (LLM) that can use text prompts to generate and discuss code. Code Llama is state-of-the-art for publicly available LLMs on coding tasks. It has the potential to make workflows faster and more efficient for developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 AWS reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.