Listing Thumbnail

    Cohere Embed Light Model v3 - English

     Info
    Sold by: Cohere 
    Deployed on AWS
    Free Trial
    Cohere provides a representative AI model, Embed Light, that translates texts and images into numerical vectors that models can understand.

    Overview

    Embed Light translates text and images into numerical vectors that models can understand. The most advanced generative AI apps rely on high-performing embedding models to understand the nuances of user inputs and search results and and documents. Embed Light is a smaller version of Embed with 384 dimensions. Please note that as of July 2025 the minimum requirement to use this model is CUDA driver 12.2 and NVIDIA driver 535.

    Highlights

    • Embed Light translates text and images into numerical vectors that models can understand. Embed Light is a smaller version of Embed with 384 dimensions.
    • Our optimized containers enable low latency inference on a diverse set of hardware accelerators available on AWS providing different cost and performance points for SageMaker customers.
    • Embeddings, Semantic Search, Retrieval-Augmented Generation (RAG), Text Classification, Clustering

    Details

    Sold by

    Delivery method

    Latest version

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Free trial

    Try this product free for 7 days according to the free trial terms set by the vendor.

    Cohere Embed Light Model v3 - English

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (6)

     Info
    Dimension
    Description
    Cost/host/hour
    ml.g4dn.12xlarge Inference (Batch)
    Recommended
    Model inference on the ml.g4dn.12xlarge instance type, batch mode
    $19.80
    ml.g5.xlarge Inference (Real-Time)
    Recommended
    Model inference on the ml.g5.xlarge instance type, real-time mode
    $5.71
    ml.p3.2xlarge Inference (Real-Time)
    Model inference on the ml.p3.2xlarge instance type, real-time mode
    $15.49
    ml.g5.2xlarge Inference (Real-Time)
    Model inference on the ml.g5.2xlarge instance type, real-time mode
    $6.16
    ml.g4dn.xlarge Inference (Real-Time)
    Model inference on the ml.g4dn.xlarge instance type, real-time mode
    $2.98
    ml.g4dn.2xlarge Inference (Real-Time)
    Model inference on the ml.g4dn.2xlarge instance type, real-time mode
    $3.81

    Vendor refund policy

    No refunds. Please contact support+aws@cohere.com  for further assistance.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Amazon SageMaker model

    An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

    Deploy the model on Amazon SageMaker AI using the following options:
    Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference  .
    Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI  .
    Version release notes

    🆕 Features Request Priority: Added priority field to chat, embed and rerank requests. High priority requests are handled first, and dropped last when the system is under load, ensuring lower latency and higher availability for high priority requests when there’s a mix of workloads with different latency requirements (e.g. realtime user requests and background batch jobs) 🐛 Bug Fixes Improved sparse embedding efficiency: Removed padded tokens from sparse embedding responses to reduce unnecessary computation and enhance accuracy for token-sparse inputs. Enhanced similarity calculation: Adopted cosine similarity (cosineSim) for more precise relevance scoring in embedding comparisons. Validated stability: Completed end-to-end testing in production and staging environments to ensure reliability. Temporary parameter limit: Restricted max_n to optimize performance during initial rollout (to be adjusted in a future update).

    Additional details

    Inputs

    Summary

    The model accepts JSON requests that specifies the input text or a data url of a base64 encoded image to be embedded.

    The model does not accept both text and images in the same request. { "texts": [ "hello", "goodbye" ], "input_type": "search_query", "truncate": "END" } //

    OR for images

    { "images": [ "....."//Some image converted to base64 and formated as a data url ],

    "input_type": "search_query", "truncate": "END" }

    Input MIME type
    application/json
    https://github.com/cohere-ai/cohere-aws/blob/main/examples/sample_embed_english_light_3_data.json
    https://github.com/cohere-ai/cohere-aws/blob/main/examples/sample_embed_english_light_3_data.json

    Input data descriptions

    The following table describes supported input data fields for real-time inference and batch transform.

    Field name
    Description
    Constraints
    Required
    texts
    An array of strings for the model to embed. Maximum number of texts per call is 1024. We recommend reducing the length of each text to be under 512 tokens for optimal quality.
    Type: FreeText
    Yes
    images
    An array of base 64 encoded data url as strings to embed. Maximum number of images per call is 1
    Default value: [] Type: FreeText
    No
    input_type
    A required field that will prepend special tokens to differentiate each type from one another. You should not mix different types together. The only exception for mixing types would be for search and retrieval, you should embed your corpus with the type search_document and then queries should be embedded with type search_query.
    Type: Categorical Allowed values: search_document, search_query, classification, clustering
    Yes
    truncate
    One of NONE|LEFT|RIGHT to specify how the API will handle inputs longer than the maximum token length. Passing LEFT will discard the start of the input. RIGHT will discard the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. If NONE is selected, when the input exceeds the maximum input token length an error will be returned.
    Default value: NONE Type: Categorical Allowed values: NONE, LEFT, RIGHT
    No
    embeddings_type
    Specifies the types of embeddings you want to get back. Not required. If unspecified, returns the float response type. Can be one or more of the types specified in Allowed Values.
    Default value: NONE Type: Categorical Allowed values: float, int8, uint8, binary, ubinary
    No

    Support

    Vendor support

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    4
    2 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    100%
    0%
    0%
    0%
    2 AWS reviews
    |
    3 external reviews
    Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.
    AmanSingh5

    Have improved project workflows using faster response times and reduced data embedding costs

    Reviewed on Nov 01, 2025
    Review from a verified AWS customer

    What is our primary use case?

    I have used Cohere in a RAG use case where I had to vectorize some data. I used multiple models in RAG to find a better model that could give superior results. I was trying to find a cloud-hosted model, and Cohere's Embed English v3.0 is a cloud-hosted model that took less time to embed the textual data. When I was trying to get the similarity search after embedding that data, Cohere provided much better results.

    Let's suppose I had to embed 100 documents at a time. Most other models, including all-MiniLM-L6-v2, took more time when I was trying to embed using that model. When I tried Cohere, it was much faster. I would say it was more than 50 to 60% faster than those models. It was even somewhat faster than text-embedding-3, which is from OpenAI. So Cohere helped to reduce the development time and embedding times.

    What is most valuable?

    I believe Cohere offers excellent features, especially the cloud-hosted model and the API calls. The number of times I can call the API within a minute is very good. The ping is great; I have started a request to Cohere model, and it was very quick to respond. The best part was the free tier because most models do not provide a free tier.

    Regarding benefits, Cohere is less costly than other models. If I talk about OpenAI or Google embedding models, they charge highly compared to Cohere. Regarding the training data, Cohere has the most data embedded or trained with the most English. Cohere's Embed English v3.0 has been trained with much more data than other models, including OpenAI. This gives an extra benefit to my organization.

    What needs improvement?

    One thing that Cohere can improve is related to some distances when I am trying similarity search. Let's suppose I have provided textual data that has been embedded. I have to use some extra process from numpy after embedding the model. In the case of OpenAI embedding models, I do not have to use that extra process, and they provide lower distances compared to my results from Cohere. I was getting distances of approximately 0.005 sometimes, but in the case of Cohere, I was getting distances around 0.5 or sometimes more than that. I think that can be improved. It was possibly because of some configuration or the way I was using it, but I am not exactly sure about that.

    For how long have I used the solution?

    I have been using Cohere for the last seven or eight months.

    What do I think about the scalability of the solution?

    The scalability was very good because of the response time. Even though I do not need that much processing at a time, I have had a good experience with Cohere so far.

    Which solution did I use previously and why did I switch?

    Previously, I was using all-MiniLM-L6-v2 and switched to Cohere because all-MiniLM-L6-v2 needed to be locally deployed. That model was processing locally, and the results I was getting from that model, even though it was open source, I was not satisfied. That is why I switched to Cohere.

    What was our ROI?

    I can highlight two benefits. Cohere charges less than OpenAI, so it saves cost. In the second use case, the timing is significant. Cohere's Embed English model took less time to embed than OpenAI's embedding ada-002 model. In this case, it also saves time. These two benefits I can highlight.

    Which other solutions did I evaluate?

    I have evaluated OpenAI's Embed English v3 and text-embedding-3 models. I have evaluated multiple models, and I even evaluated some models from Hugging Face .

    What other advice do I have?

    Cohere provides a free tier, and any developer who is starting their journey can use Cohere for RAG use cases. They can utilize the model benefits. After using Cohere, I got distances after the similarity search that were much lower compared to other vectorization and embedding models. The only model that performed better than Cohere was OpenAI's text-embedding-3-large. It was good, but Cohere was the second-best performing model in my use case.

    I think Cohere's use cases are excellent, and I would suggest Cohere to others because of the less response time and time-saving in the process. It is also cheaper than other models. I would give this review a rating of eight out of ten.

    Which deployment model are you using for this solution?

    On-premises

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Rustam Sharipov

    Has improved customer interaction speeds and supports flexible model switching

    Reviewed on Oct 31, 2025
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for Cohere is to use a Cohere embedded model to create our own vector databases and check conversations.

    A specific example of how I use Cohere's embedding model for our vector databases or conversation checking involves abilities that take customer approvals and convert that information into vectors. I save this information in our own systems and also store small vectors on customer devices to use during custom customer requests.

    My use case involves indexing and saving small portions of information.

    What is most valuable?

    In my experience, Cohere offers reliable embedding models for customers who do not want to use standard OpenAI models.

    I find that the choice of embedding models is limited, and Cohere was available for Azure , which makes it a good alternative for customers who prefer not to use OpenAI.

    Cohere has positively impacted my organization by helping our customers work more efficiently when creating requests, and the embedding results are of very high quality.

    What needs improvement?

    I believe Cohere can be improved technically by providing more feedback, logs, and metrics for embedding requests, as it currently appears to be a black box without any understanding of quality. Quality can only be understood after using it with customer requests, and during the embedding process, measurable metrics are not visible.

    There are no particularly unique features distinguishing Cohere from other solutions.

    For how long have I used the solution?

    I have been using Cohere for approximately nine to ten months.

    What do I think about the stability of the solution?

    Cohere is stable in my experience.

    What do I think about the scalability of the solution?

    The scalability of Cohere showed that after sending a large amount of information and embeddings, it became slower, though we do not use any special solution for scaling.

    How are customer service and support?

    I have not interacted with Cohere's support team. However, I contacted Azure  about the slowness, and we decided to use smaller chunks of information during the embedding process.

    How would you rate customer service and support?

    Neutral

    Which solution did I use previously and why did I switch?

    I previously used embedding models from OpenAI. I switched to Cohere because customers wanted to use something other than OpenAI models.

    How was the initial setup?

    I did not purchase Cohere through the Azure Marketplace . I deployed unmanaged models and shared models.

    What was our ROI?

    I do not have relevant metrics about the return on investment from using Cohere yet because the customer's application is in a paging stage and has not been released. However, I understand that it is performing well, and we plan to continue with it.

    What's my experience with pricing, setup cost, and licensing?

    My experience with pricing, setup cost, and licensing indicates that it does not require a special license, and the prices are competitive compared to competitors.

    Which other solutions did I evaluate?

    I did not evaluate other options before choosing Cohere. I looked at prices, and since we used Azure cloud, it did not provide many models for selection. Only OpenAI and Cohere were available for embedding.

    What other advice do I have?

    For others looking into using Cohere, I advise that it is a good model for people who want to be agnostic when using models and creating something flexible to switch from one model to another. I would rate this product an eight out of ten.

    Daniel Pan

    Has built key functionality for AI workflows in enterprise applications

    Reviewed on Oct 09, 2025
    Review from a verified AWS customer

    What is our primary use case?

    We founded this company two and a half years ago, and since the middle of 2022, we foresaw the trending of generative AI and large language models, so my startup is working on developing generative AI applications for our clients, including enterprises and a few other startups across America and Canada.

    I started using Cohere when we first got information from the community about their reranking models almost one and a half years ago.

    In some clients' projects, we were required to introduce reranking model in the RAG flow (Retrieval-augmented generation). In this flow, we use different components to allow users to select and pick up from the UI components, drag and drop to their flow to enhance their RAG pipeline. That's where we introduced Cohere models as one of the providers for reranking.

    How has it helped my organization?

    Cohere's reranking model helped us complete this request

    What is most valuable?

    From our data, I can tell that at least 15% of end users were actively using reranking to enhance their RAG pipeline because we have the UI to indicate that reranking is recommended as it can enhance the quality of the retrieval.

    For clarification, I want to describe this data more clearly. As mentioned, 15% of end users chose to enable this module based on the fact that we have the pricing tier with an extra cost for their API call.

    In general, I'm satisfied with the speed, and I can confirm this because we have the long fields to track all conversations, and we see that this loop for reranking actually costs relatively less time throughout the whole chat flow. Regarding quality, it's hard to tell because we don't have a benchmark. In our enterprise applications, we are trying to build up evaluation pipelines, do AB testing, and other analysis, but it's not a conventional computer science application, so it's very hard to build up evaluation pipelines with objective criteria. It's challenging for us to make a conclusion about quality, but the speed is good.

    A direct benefit of using Cohere's reranking model is that we can tell clients we have this module rather than missing this piece, as reranking is a very important component that companies discuss to enhance RAG quality.

    Although it's not impacting our business model, I'm pushing for the evaluation system because it can expand our business scope. We want to sell our system to clients, and while they may not be aware of evaluation initially, it's beneficial to have. Once we have these systems, we can showcase to end users that employing such a reranking system improves quality. We need proof to convince ourselves that after implementing reranking, we get better quality.

    What needs improvement?

    It would be better to have a dashboard for users to showcase how reranking helps improve quality. When end users choose the service, they want to see the actual output. The evaluation part is challenging for recent large language model applications but remains very important.

    If Cohere could provide a dashboard where we can employ an LLM as a judge to check quality before and after reranking, that would be helpful. We could either have another large language model evaluate this part or allow UAT users to manually check with humans in the middle. As an enterprise provider, we want such features because when chatting with clients, we can demonstrate that employing Cohere's reranking model significantly improves results compared to not using it.

    Documentation is not a major blocking issue for us as we are sophisticated software engineers. Integration and the API provided for reranking models are not complicated, so we can easily handle that. The documentation is good. The major point is to prove the value through evaluation. We need a sophisticated solution to showcase visibly to our clients and engineering team to convince them that using this model creates improvements.

    For how long have I used the solution?

    I started using Cohere when we first got information from the community about their reranking models almost one and a half years ago.

    What do I think about the stability of the solution?

    That's only what we need in our product currently. I will communicate when we have other requirements.

    We haven't had any issues to escalate to Cohere's support because reranking is an optional feature in our product, and we haven't seen any significant issues so far.

    What do I think about the scalability of the solution?

    We don't observe many scaling problems because it's an enterprise application.  There are a few hundred people using this. The concurrent user rate is not significant, which might be why we don't see many scaling issues so far.

    How are customer service and support?

    We haven't had any issues to escalate to Cohere's support because reranking is an optional feature in our product, and we haven't seen any significant issues so far.

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    For reranking, Cohere was our only solution.

    How was the initial setup?

    I'm more focused on the speed and overall quality of the model itself and the chat flow as a whole solution. That's why I'm not in the position to comment on the price and setup cost as there are DevOps working on this piece.

    What was our ROI?

    Hard to estimate the overall ROI. but if you see the ROI for the feature of reranking, it's a positive number

    What's my experience with pricing, setup cost, and licensing?

    I'm not in the position to answer that question because I was not the one who deployed that model, but I believe it is because we see the model name as ARN name, so it's most likely coming from Bedrock.

    Which other solutions did I evaluate?

    For reranking, Cohere is the only solution we have used so far.

    What other advice do I have?

    As a feature developer, I'm more focused on the speed and overall quality of the model itself and the chat flow as a whole solution. That's why I'm not in the position to comment on the price and setup cost as there are DevOps working on this piece. My rating for this solution is 8 out of 10.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Gokul Anil

    Has streamlined test creation and analysis while needing better semantic accuracy for specific domain knowledge

    Reviewed on Oct 08, 2025
    Review provided by PeerSpot

    What is our primary use case?

    I am working on test automation, specifically an intelligent test automation framework. Based on the existing framework, which is handled in TypeScript and Selenium, I used Cohere intelligence to create new tests based on the test data and test cases that we provide. It will read through all the test cases in natural language, process them, analyze the internal working of our existing framework, and create the artifacts, test data, and test source based on the existing framework.

    Currently, we are using Cohere APIs. First, I used the chat in the application itself to identify how it works by providing RAG sources, including PDF and text files. After confirming it worked fine, we moved to find an API, and we are using that API to handle all these tasks. The APIs are very functional for all our current use cases, mainly the intelligent test automation.

    What is most valuable?

    Cohere is very useful because I have been in scenarios where code was written with multiple reusable concepts containing many functionalities covered as different functions, but without descriptions of what particular functions were doing. We used Cohere intelligence and its knowledge on Oracle ERP  PPM , and it was able to read through all the TypeScript code and create descriptions intelligently, which were almost 90% correct when reviewed.

    It was very useful because we had 500-plus reusables, and it was able to analyze all of them and put them into a catalog. This makes it very easy to find and use the catalog to determine whether existing functionality is already implemented, preventing redundant implementations.

    When it creates a new test, it creates it almost 70 to 80% correctly without errors. The time savings are significant - what previously took one or two days can now be completed in two to three hours maximum. We can complete many more tests in a day or sprint with Cohere's help.

    Along with test automation, we handle analysis tasks, and now we have more time for better analysis. We are planning to implement test analysis capabilities as well. Once you receive the requirements and test cases, you can directly use them as input, and it will generate all artifacts and test data.

    What needs improvement?

    When performing similarity matching between text descriptions and the catalog descriptions created using Cohere, the matching could be improved. Because it does not have extensive understanding of Oracle functionalities in ERP , it sometimes gives wrong results or the confidence score is lower than desired. Improving that understanding would provide better matches.

    When working with Cohere and providing large data sets, there was some hallucination, though it mostly works fine without many issues.

    For how long have I used the solution?

    I have been using Cohere for almost seven to eight months.

    What do I think about the stability of the solution?

    I have not faced any downtime or related issues. It works fine.

    Which solution did I use previously and why did I switch?

    I used Llama but it was not giving results comparable to what I get from Cohere when comparing the two solutions. We only had these two options at that time, and we chose Cohere over Llama.

    How was the initial setup?

    The setup was pretty smooth. I was able to find things easily. The documents were readily available on the internet, and I was able to find and integrate them without any issues. I subscribed to emails about new model updates, which allowed me to stay current. Oracle has now wrapped it inside their own AI, and we are using the latest version of Cohere as our chosen model.

    What about the implementation team?

    I started with the public version and then they wrapped it inside Oracle's system. I believe it is private, only accessible to Oracle employees with proper authentication and sign-in details. The pricing and setup were handled by the organization, so I am not aware of those aspects.

    Which other solutions did I evaluate?

    We only had two options at that time: Llama and Cohere. After trying both, we chose Cohere over Llama.

    What other advice do I have?

    Try it and use it. If you find it worthy, then implement it. I have shared all my experiences with you. My rating for Cohere is 7 out of 10.

    CollinsOmondi

    Support team is available and answers all the questions and it is also free which is good for personal projects

    Reviewed on Jul 02, 2024
    Review provided by PeerSpot

    What is our primary use case?

    I use it for a personal project, a Discord bot for my Discord server. I haven't used it that much, but so far it's amazing. I like the support team. They are very good.

    How has it helped my organization?

    Everything is definitely intuitive, and whenever you have an issue, it's very easy to reach out to them on Discord. They're very active, so I'm not really complaining about having issues.

    What is most valuable?

    The very first thing that I really like about it is the support team. They're really available on Discord, and they answer all of your questions.

    I think it's free for personal projects unless you want to go to production. I haven't really used it that much, but the features that I have used so far, I have no issues with them.

    What needs improvement?

    Cohere has text generation. I think it is mainly focused on AI search. If there was a way to combine the searches with images, I think it would be nice to include that.

    For how long have I used the solution?

    I've recently started exploring Cohere. It has been a few months now, two to three months.

    What do I think about the stability of the solution?

    I'll rate the stability a six out of ten since I haven't been using it much. I haven't really seen any issues.

    What do I think about the scalability of the solution?

    I will rate the scalability a seven out of ten because I haven't explored it all the way.

    How are customer service and support?

    The customer service and support are very good.

    How would you rate customer service and support?

    Positive

    How was the initial setup?

    It's very easy. You just need an API key, and all the configurations are there. It's very easy to start.

    If you have worked with something that requires API keys, you should be good to go. I don't think you need a lot of experience.

    What's my experience with pricing, setup cost, and licensing?

    Cohere has a free tier. You can use the API in development mode, so you can just use it for free. But if you go to production, you will have to pay.

    I would advise someone to really consider it first if they really need it because it can be expensive.

    So it might be a little expensive.

    What other advice do I have?

    Overall, I would rate it a seven out of ten.

    I would recommend it to others because it is very promising, so it would be worth the time. Others should try it.

    View all reviews