I use the product for the fastest LLM inference for LLama 3.1 70B and GLM 4.6.
Fast inference has enabled ultra-low-latency coding agents and continues to improve
What is our primary use case?
How has it helped my organization?
We use it to speed up our coding agent on specific tasks. For anything that is latency-sensitive, having a fast model helps.
What is most valuable?
The valuable features of the product are its inference speed and latency.
What needs improvement?
There is room for improvement in supporting more models and the ability to provide our own models on the chips as well.
For how long have I used the solution?
I have used the solution for one year.
Which solution did I use previously and why did I switch?
I previously used Groq and Sambanova, but I switched because they were serving a spec dec model that had worse intelligence than the listed model.
What's my experience with pricing, setup cost, and licensing?
They are more expensive, but if you need speed, then it is the only option right now.
Which other solutions did I evaluate?
I evaluated Groq and Sambanova.
What other advice do I have?
Their support has been helpful, and I've had a few outages with them in the past, but they were resolved quickly. I recommend using it for speed and having a good fallback plan in case there are issues, but that's easy to do.
High-speed parallel inference has transformed quantitative finance decisions and expands model diversity
What is our primary use case?
Our primary use case is high TPS-burst inference, executed in parallel across many large parameter language models.
How has it helped my organization?
The throughput increase has extended decision-making time by over 50 times compared to previous pipelines when accounting for burst parallelism. This has improved both end-to-end performance and opened new use cases within our domain, specifically in the field of quantitative finance.
What is most valuable?
The most valuable features for us are the speed (TPS) and the diversity of models.
What needs improvement?
There is room for improvement in the integration within AWS Bedrock.
For how long have I used the solution?
We have been using the solution since its launch on AWS.
Which solution did I use previously and why did I switch?
We previously used a combination of Bedrock and local LLM compute.
Which other solutions did I evaluate?
We considered alternate solutions such as Groq, Bedrock, Local Inference, and lambda.ai.
What other advice do I have?
I recommend giving it a try!
Has enabled faster token inference to improve customer response times
What is our primary use case?
I use it for fast LLM token inference.
How has it helped my organization?
Cerebras' token speed rates are unmatched. This can enable us to provide much faster customer experiences.
What is most valuable?
One of the most valuable features is the very fast token inference.
For how long have I used the solution?
I have used the solution for one week.
Which solution did I use previously and why did I switch?
I am currently leveraging most top models from Google, OpenAI, Anthropic, and Meta.
What's my experience with pricing, setup cost, and licensing?
I have no advice to give regarding setup cost.
Which other solutions did I evaluate?
I also considered Sonnet, GPT, Gemini, and Scout.
What other advice do I have?
Cerebras has a great collection of team members who genuinely want to help you get up and going.