
Overview

Product video
Deepgram voice AI models power your apps with world-class speech-to-text and domain-specific language models (DSLMs). Effortlessly accurate. Blazing fast. Enterprise-ready scale. Unbeatable pricing. Everything developers need to build with confidence and ship faster.
-
Deepgram Datasheet - https://drive.google.com/file/d/1YngGzUJZhnH8nj-ZFuhrSiaLD4w9HFd4/view?usp=sharingÂ
-
Deepgram API Playground to tryout all features and models (free tier) - https://playground.deepgram.com/?smart_format=true&language=en&model=novaÂ
-
Deepgram Summarization (domain specific language model) - https://developers.deepgram.com/docs/summarizationÂ
-
Generative AI Demo with partners: OneReach.ai, Vonage and Deepgram Partner to Revolutionize Conversational AI - https://www.youtube.com/watch?v=CFTk0S6tGF8Â (2min)
-
Introducing Nova-2: The Fastest, Most Accurate Speech-to-Text API (video 9min) - https://www.youtube.com/watch?v=PSaVX6ST-FMÂ
For questions and custom quote options, reach out to us at aws@deepgram.com .
Highlights
- Transcription (STT) - 20x faster: Transcribe in real-time or an hour of pre-recorded audio in about 12 seconds. - <300ms latency: The fastest real-time transcription speeds for human-like conversational AI experiences, real-time analytics, and enablement. - >90% accuracy: Deepgram leads the industry with most accurate models in market across use case categories.
- Understanding - Summarization - Sentiment analysis - Sentiment analysis - Language translation - Speaker diarization - Language Detection - And more...
- Custom Model Training - Deepgram will support customer specific custom model training to ensure your model works to meet your business objectives.
Details
Unlock automation with AI agent solutions

Features and programs
Buyer guide

Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/month |
---|---|---|
Enterprise Offering | Custom Enterprise Offering | $10,000,000.00 |
Cost per Transcription Hour | Deepgram charges per transcription hour | $1,250.00 |
Vendor refund policy
Deepgram Terms of Service: https://deepgram.com/terms/Â
Custom pricing options
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Resources
Vendor resources
Support
Vendor support
For sales, contracting and usage inquires, please email aws@deepgram.comÂ
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.


Standard contract
Customer reviews
Has delivered natural low-latency voice response but faces limitations with concurrent connections
What is our primary use case?
We are currently using Deepgram for text-to-speech, specifically the Nova model. Our usual use cases of Deepgram involve text-to-speech, primarily focusing on building speech-to-speech interactive AI chatbots.
What is most valuable?
The most valuable capabilities of Deepgram that I've found so far include low latency, as it offers less than 200 milliseconds, which is not provided by any other text-to-speech models. The variety of voices is also good compared to others.
The positive impact and benefits I've seen from working with Deepgram include the very human-like and realistic quality of the voices. Deepgram's low latency has significantly improved our organization's customer service, as, to the best of my knowledge, it offers a very low latency compared to all text-to-speech providers, which I appreciate.
What needs improvement?
One issue we've faced relates to the pricing structure; with the pay-as-you-go model, we only get eight concurrent connections for web sockets for text-to-speech, which makes it difficult to scale. For enterprise, the annual fee is around $25,000 to $30,000 USD, regardless of usage, which allows for 100 concurrent connections, but still doesn't provide enough scalability when we're using a lot.
I have noticed that the web socket connection sometimes breaks due to inactivity, and increasing the timeout period would be beneficial. Additionally, if you leave the web socket connection with the TTS model for a certain period, it loses the connection. Even when used continuously for long periods, it occasionally gives an error. During a call with the support team, they acknowledged that there is an issue on their side. We have the intention to improve or optimize stability for future releases, but since Deepgram is not open-source, we don't have much control over that.
For how long have I used the solution?
I have been working with Deepgram for close to two years.
What was my experience with deployment of the solution?
We didn't have any issues integrating Deepgram's API into our existing workflows because the documentation is straightforward.
What do I think about the stability of the solution?
Deepgram has been stable and reliable; I haven't seen any issues except for some occasional connection losses.
What do I think about the scalability of the solution?
We haven't seen a return on investment with Deepgram so far; we have been building POCs for the last two years but recently switched to AWSÂ in the last two months due to scalability issues with the pay-as-you-go model. We are no longer using Deepgram, as AWSÂ provides higher scalability with 10,000 connections at a single go, despite higher latency than Deepgram.
How are customer service and support?
Based on my experience with them, I would rate the technical support a nine.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We didn't try Deepgram's domain-specific vocabularies feature. We started with Deepgram and then switched to AWS.
How was the initial setup?
I didn't personally participate in the initial setup of Deepgram.
What about the implementation team?
We took Deepgram directly from the source and did not purchase it via any marketplace.
What was our ROI?
We haven't seen a return on investment with Deepgram so far; we have been building POCs for the last two years but recently switched to AWS in the last two months due to scalability issues with the pay-as-you-go model.
Which other solutions did I evaluate?
We did not evaluate other options or vendors before choosing Deepgram. The factors we considered before choosing Deepgram included the lower error rate compared to other available providers and the very low latency of around 200 milliseconds, which is significantly less than other models in the industry at that time.
What other advice do I have?
We are an end-user of Deepgram, using it as a service. Regarding transcription, we haven't used speech-to-text models; we focus on text-to-speech, which has been good, and I haven't noticed any errors there. I have about five years of experience in my current field. Overall, based on my experience, I would rate Deepgram an eight.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
A Powerful, Adaptable, and Constantly Evolving STT Solution for Voice Automation
What is our primary use case?
For the last two years, our primary use case for Deepgram has been to power sophisticated, AI-driven voice bots for major US clients.
The technical workflow is as follows:
- A client initiates a call to a Twilio number.
- Our system captures the audio and streams it in real-time to Deepgram 's Speech-to-Text service.
- Deepgram transcribes the speech into text with high accuracy.
- This text is then passed to a Large Language Model (LLM) to analyze and determine the user's intent.
- Based on the identified intent, we trigger the appropriate backend functions to generate a relevant response.
- Finally, we use a Text-to-Speech (TTS) engine, such as ElevenLabs , to convert the response back into audio and play it for the user.
The entire process is built upon the speed and reliability of Deepgram's transcription. Our environment is deployed on the Public Cloud, specifically using Amazon Web Services (AWS ).
What is most valuable?
Of course. Based on my review, here are the features I've found most valuable:
- Continuous Innovation and Responsiveness: I find it incredibly valuable that Deepgram is not a static product. They are constantly evolving and genuinely listen to user feedback. The evolution from their Nova models to the new Flux model, which was specifically designed to solve end-of-speech detection for conversational AI, is a perfect example. It shows they are committed to solving real-world problems for their users.
- High Accuracy and Reliability: For my voice bot solutions, accuracy is non-negotiable. The models are remarkably accurate, performing at 90-92% efficiency even with challenging conditions like background noise and a wide range of international accents. Furthermore, the service has been incredibly stable; in my four years of using it, we've never experienced downtime.
- Excellent Configurability and Ease of Integration: Deepgram offers a level of granular control that allows me to fine-tune the STT engine's behavior, which is a significant advantage over competitors. This flexibility, combined with straightforward integration, extensive documentation, and robust code examples, allows my team to be highly efficient.
- Cost-Effectiveness and Scalability: The pay-as-you-go pricing model is both affordable and transparent. It provides a significant return on investment because it satisfies all our primary requirements—technical accuracy, ease of integration, and low implementation cost—within a scalable and predictable financial model.
- Outstanding Customer Support: The support team is brilliant and always ready to assist. Having access to official support channels, active community forums, and frequent webinars ensures that we are never without resources, which is crucial for a business-critical application.
What needs improvement?
Honestly, Deepgram has been exceptionally proactive in addressing the primary area that needed improvement. My main challenge was with the real-time detection of when a user has finished speaking in a live conversation, which is critical for a responsive voice bot. They directly solved this by releasing their Flux model.
Because Flux is a recent release, I haven't yet had enough time to thoroughly test it and identify new limitations. At this stage, any "improvement" would be more of a "nice-to-have" feature rather than a fix for an existing problem. The core service is already very robust and meets all of our current needs.
What additional features should be included in the next release?
Looking toward the future, here are a few features that could add even more value to an already excellent platform:
- Advanced Built-in Analytics: While I can get the raw transcript and build my own analytics pipeline, it would be powerful to have features like sentiment analysis, emotion detection, or automatic summarization offered directly through the API. This would save significant development time.
- More Granular Speaker Diarization: For calls with multiple participants, enhancing the real-time speaker diarization (labeling who is speaking) to be even more precise would be a fantastic addition for creating detailed call analyses.
- Tighter Integration with TTS: Since Deepgram is also expanding into Text-to-Speech (TTS), offering a more seamlessly integrated STT-to-TTS pipeline could simplify the development stack for creating voice agents from start to finish.
- Specialized, Pre-Trained Industry Models: While the general models are highly accurate, offering even more specialized, pre-trained models for specific industries like finance, healthcare, or legal-which are heavy on specific jargon-could push the accuracy even higher for those niche use cases.
For how long have I used the solution?
I have been using the solution for four years.
What do I think about the stability of the solution?
Based on my experience, my impression is that the solution is exceptionally stable.
We have never experienced any downtime. Their service is very transparent, and they even provide a status page where you can check the availability of their systems. It's a reliable and robust platform that we can depend on for our business-critical voice bot applications.
What do I think about the scalability of the solution?
We have never faced any issues with downtime or performance, even as our usage has grown. The architecture is clearly built to handle high volumes of real-time transcription. Furthermore, its pay-as-you-go, usage-based pricing model directly supports this scalability, making it financially viable to grow our services without being locked into a rigid plan. It's a system that scales seamlessly both technically and financially.
How are customer service and support?
Based on my experience, the customer service and support from Deepgram have been outstanding.
The support team is brilliant, highly reachable, and always ready to assist whenever we have a question or need help. It's a comprehensive support system that goes beyond just a direct contact channel; we have access to official support, very active community forums, and they frequently schedule webinars to share announcements and updates.
I've always felt that there are plenty of resources available, and we've never been left without a solution. It's a very real and accessible support system -Â a simple email or call gets you the assistance you need.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Yes, I did. Initially, I used AssemblyAIÂ in parallel with Deepgram while evaluating the best solution for our needs.
I made the switch to using Deepgram exclusively because of its superior configurability. While AssemblyAIÂ is a solid product, I found that Deepgram provides a much deeper, more granular level of control. It allows me to fine-tune the behavior of the STT engine down to a micro-level, which is critical for optimizing the performance and accuracy of our voice bots. That ability to precisely tailor the service to our specific use case is why Deepgram ultimately stood out as the better choice for us.
How was the initial setup?
The initial setup was very straightforward.
It was a simple "Do-It-Yourself" (DIY) process that our in-house team handled entirely on our own, without needing to involve any external vendors. The primary reasons it was so easy were the extensive resources Deepgram provides:
- Excellent Documentation: The documentation is clear, comprehensive, and easy to follow.
- Rich Code Samples: They have robust GitHub repositories filled with plenty of examples and code samples in multiple languages, including Python, Java, and JavaScript. This made integration into our existing systems much faster.
- Strong Community and Support: The availability of an active support community meant that if we had any questions, resources were readily available.
These factors combined made the implementation and integration process smooth and efficient.
What about the implementation team?
We implemented the solution entirely with our in-house team. It was a straightforward process, and we did not involve any vendors.
What was our ROI?
Our return on investment (ROI) with Deepgram has been excellent, although I don't track it as a specific percentage. The value comes from several key areas:
- Low Implementation Cost: The solution is very developer-friendly with great documentation, which allowed our in-house team to integrate it quickly without needing to hire external vendors. This significantly reduced our initial investment.
- Cost-Effective Operational Model: The pay-as-you-go pricing is transparent and affordable. It scales directly with our usage, which means our costs are always aligned with our business volume, preventing large, unnecessary expenses.
- High-Value Enabler: The primary ROI comes from the fact that Deepgram's high accuracy and reliability are the foundation of our voice bot service. It enables us to deliver a high-quality product to our clients, which in turn generates our revenue. The investment in Deepgram directly translates to our ability to operate and grow our business.
In short, the ROI is demonstrated by low initial costs, predictable operational expenses, and the high quality of the core technology that powers our entire service offering.
Which other solutions did I evaluate?
Yes, before committing to Deepgram as our primary solution, I evaluated other options. The main competitor I looked at was AssemblyAI.
I used both AssemblyAI and Deepgram in parallel for a period to directly compare their performance in our real-world use cases. While AssemblyAI is also a good service, I ultimately chose Deepgram because it offered significantly more configurability. This allowed me to fine-tune the Speech-to-Text engine at a much more granular level, which was crucial for achieving the highest possible accuracy and performance for our specific voice bot applications.
What other advice do I have?
Yes, I absolutely have some advice for anyone considering or currently using Deepgram.
- Don't Settle for the Defaults: The single biggest advantage of Deepgram over its competitors is its deep configurability. My advice is to really spend time with their documentation and API parameters. You can fine-tune the models to your specific audio environment, the accents you typically encounter, and the vocabulary relevant to your industry. This is where you can move from 90% accuracy to 95% or higher for your specific use case.
- Stay Engaged with Their Updates: Deepgram innovates at a rapid pace. The release of the Flux model is a perfect example of how they solve real-world problems their users are facing. I highly recommend subscribing to their newsletters and attending their webinars. You might find that they've released a new feature or model that directly addresses a challenge you're working on, saving you significant development effort.
- Leverage the Full Ecosystem: Think of Deepgram as the first crucial step in a larger data pipeline. The real power is unlocked when you connect its highly accurate transcripts to other services. As in my use case, feeding the text into an LLM for intent recognition, sentiment analysis, or summarization opens up a world of possibilities. You can analyze sales calls, automate customer support, or create detailed meeting summaries.
- Use the Community and Support: Don't hesitate to engage with their support channels or community forums if you run into issues. My experience has been that they are incredibly responsive and helpful. The community is also active, and it's likely someone else has faced and solved a similar problem to yours.
In summary, my advice is to be an active user. The more you explore the platform's capabilities and stay current with its evolution, the greater the return on your investment will be. It's a top-tier solution that rewards a hands-on approach.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Has reduced response time and replaced human support with accurate bilingual transcriptions
What is our primary use case?
I use Deepgram for a company that requested me to implement an AI voice agent for a security application that warns other neighbors of near alerts of some incidents that may occur in their neighborhoods.
I implemented this in January 2025, using Deepgram as a transcriber for those conversations for three months, and I love the technology because it transcribes very well all the conversations, making the implementation relatively easy.
My main use case for Deepgram is just for transcribing, and since this company is a Spanish company, I got deep into some use cases and settings configurations to adjust those transcriptions that include both Spanish and English words.
Deepgram handled one of these bilingual conversations by adjusting some settings, such as the name of the company being in English while the conversation was in Spanish, so we needed to configure it to transcribe accurately because Vapi utilized that transcription for the LLM agent to speech those words through an agent voice. Regarding my experience with those bilingual transcriptions, I think the transcriptions were quite precise, and while there is room for improvement, the results met expectations, making Deepgram a good fit for that work.
What is most valuable?
The best features Deepgram offers for me include mainly the transcription option, which I think is the robust solution among other providers since Deepgram does the job quite well.
Deepgram's transcription stands out compared to other solutions primarily due to its speed and accuracy; those are important points for me because not all providers or tools handled Spanish well, but Deepgram adjusted perfectly for that use case, and we also chose 11Labs voice, a South American voice, which worked very well with Deepgram.
Deepgram has positively impacted my organization by achieving our desired results, which is very good from the overall technology perspective, saving a lot of time for the support team since the voice agent replaced the human agents managing the calls, thus improving response time and reducing the time dedicated by those human agents.
What needs improvement?
Regarding improvements for Deepgram, I think the quality of the transcriptions could be enhanced, as the Spanish accent poses challenges, making it harder to transcribe some words, and considering additional accents from Chilean or Argentine speakers could improve the model's performance with local words.
I don't have any additional improvements for Deepgram besides those I mentioned earlier about Spanish accents and transcription quality.
For how long have I used the solution?
I have been working in my current field for 14 years, and since I am 40, I have quite a bit of experience.
What do I think about the stability of the solution?
In my experience, Deepgram is very stable, as I haven't encountered any downtime or issues.
What do I think about the scalability of the solution?
Deepgram's scalability has been fine; there were some limit issues with Vapi , but those issues stemmed from the Vapi platform and not Deepgram itself.
How are customer service and support?
I did not interact with customer support for Deepgram, so I cannot comment on my experience with them.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
Before Deepgram, we used other transcribers, but I don't remember the specific ones because they didn't work so well, prompting us to switch.
What was our ROI?
I have seen a return on investment in terms of time saved and fewer employees needed for both.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing was good, as I found it to be cheaper without any problems.
Which other solutions did I evaluate?
Before choosing Deepgram, I did evaluate other options, but it was mainly a decision based on the integration with Vapi.
What other advice do I have?
My advice for others looking into using Deepgram is to read the documentation because the API is very flexible, and I encourage them to just test it out as it's a wonderful technology.
I was offered a gift card in AWSÂ for this review.
On a scale of 1-10, I rate Deepgram a 9.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Handles large data, good documentation is available and powerful model
What is our primary use case?
I use Deepgram for audio transcriptions and speech recognition. I am working on a feedback survey app where users provide verbal feedback that Deepgram transcribes into text.Â
We receive the results and implement features like punctuation and Smart Format.
How has it helped my organization?
Deepgram has significantly improved our transcription process in terms of speed and accuracy. It has allowed us to efficiently convert verbal feedback into text, enabling quicker analysis and implementation of new features.Â
Integrating Deepgram has streamlined our workflow, enhancing productivity and delivering more accurate transcription results.
What is most valuable?
We previously used IBM Watson, which was slow and had limitations in accurately transcribing words. After evaluating OpenAI's Whisper model, we discovered Deepgram, which incorporates Whisper and adds the powerful Nova model.
Deepgram's latency is impressively low, around 0.5 to 1 second, making it a superior choice.
What needs improvement?
Live transcription could be improved. Sometimes, Deepgram's WebSocket is disposed of due to redundancy issues. Enhanced stability in live transcription would be beneficial.
For how long have I used the solution?
I have been using Deepgram for one and a half years.
What do I think about the stability of the solution?
Initially, we encountered some stability issues, but Deepgram has since improved its architecture. With the addition of hooks for status updates, the accuracy has improved to approximately 90 to 95%, which is better than other models we've tested.
What do I think about the scalability of the solution?
It's scalable. Our platform handles 50 to 60 users simultaneously without compromising accuracy. For instance, a 20-minute audio file was transcribed within a second, demonstrating its ability to handle large volumes of audio data effectively.
How are customer service and support?
My experience with customer service and support has been positive. They are responsive and helpful, and they provide timely resolutions to any issues.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We previously used IBM Watson, but it didn't deliver appropriate results. We searched for alternatives and found OpenAI's Whisper model, which was initially slow. After thorough analysis, we discovered Deepgram. It proved to be superior, leading to our decision to migrate. We used a detailed spreadsheet to compare various models before making the switch.
How was the initial setup?
Thanks to clear documentation, the initial setup was very easy. If you have prerequisite knowledge of the programming language you're using, it’s straightforward to follow the documentation and implement it into your system. When I started, I closely followed the documentation, which made the process very manageable.
Deployment model: We last deployed it on the Google Cloud Platform (GCP).
What about the implementation team?
The implementation was done in-house.
What was our ROI?
Our ROI has increased due to enhanced transcription accuracy and speed, leading to more efficient workflows and better user satisfaction.
What's my experience with pricing, setup cost, and licensing?
The pricing is moderate. While live transcription may incur some charges when the connection is open, they become minimal over time. So, it's a balanced option—neither cheap nor overly expensive.
Which other solutions did I evaluate?
Yes, besides IBM Watson, we evaluated OpenAI's Whisper model.
What other advice do I have?
Deepgram is highly recommended. Users don’t need to do anything special before using it, as the documentation is comprehensive. I am a Node.js developer and have used Deepgram packages for Node.js. Understanding your programming language is key, whether it's Node.js, Python, or others.
AI Features:
I have integrated various AI models into our application. Deepgram's sentiment analysis feature allows us to create graphs and analyses to determine if words are positive, negative, or neutral. This helps us summarize feedback and derive actionable insights.
My ratings:
I would rate it an eight out of ten. The live transcription feature needs improvement as the WebSocket sometimes gives errors or breaks down during live streams.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Used for TTS (Text-to-Speech) and STT (Speech-to-Text) purposes
What is our primary use case?
We use the solution for TTS (Text-to-Speech) and STT (Speech-to-Text) purposes.
What is most valuable?
The solution's Speech-to-Text conversion feature is really awesome.
What needs improvement?
Deepgram is currently restricted to only the English variants, but it should include other languages, such as German or French.
For how long have I used the solution?
I have been using Deepgram for five to six months.
What do I think about the stability of the solution?
Deepgram is a stable solution.
What do I think about the scalability of the solution?
The Deepgram cloud can handle large volumes of audio data. Around three to four people use the solution in our organization.
How was the initial setup?
The solution’s initial setup is easy.
What's my experience with pricing, setup cost, and licensing?
Deepgram is a cheap solution. We can create an account for $200, which we can initially use for the Deepgram services.
What other advice do I have?
I have used Deepgram with Twilio for the calling system. I would recommend Deepgram to users who want to use it for speech-to-text purposes.
Overall, I rate the solution an eight out of ten.