Listing Thumbnail

    Speech-to-Text & Text-to-Speech GenAI API

     Info
    Sold by: Deepgram 
    Free Trial
    Deepgram, Language AI models to power your apps.

    Overview

    Play video

    Deepgram language AI models power your apps with world-class speech-to-text and domain-specific language models (DSLMs). Effortlessly accurate. Blazing fast. Enterprise-ready scale. Unbeatable pricing. Everything developers need to build with confidence and ship faster.

    For questions and custom quote options, reach out to us at aws@deepgram.com .

    Highlights

    • Transcription (STT) - 20x faster: Transcribe in real-time or an hour of pre-recorded audio in about 12 seconds. - <300ms latency: The fastest real-time transcription speeds for human-like conversational AI experiences, real-time analytics, and enablement. - >90% accuracy: Deepgram leads the industry with most accurate models in market across use case categories.
    • Understanding - Summarization - Sentiment analysis - Sentiment analysis - Language translation - Speaker diarization - Language Detection - And more...
    • Custom Model Training - Deepgram will support customer specific custom model training to ensure your model works to meet your business objectives.

    Details

    Delivery method

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Free trial

    Try this product free according to the free trial terms set by the vendor.

    Speech-to-Text & Text-to-Speech GenAI API

     Info
    Pricing is based on the duration and terms of your contract with the vendor. This entitles you to a specified quantity of use for the contract duration. If you choose not to renew or replace your contract before it ends, access to these entitlements will expire.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    1-month contract (2)

     Info
    Dimension
    Description
    Cost/month
    Enterprise Offering
    Custom Enterprise Offering
    $10,000,000.00
    Cost per Transcription Hour
    Deepgram charges per transcription hour
    $1,250.00

    Vendor refund policy

    Deepgram Terms of Service: https://deepgram.com/terms/ 

    Custom pricing options

    Request a private offer to receive a custom quote.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Software as a Service (SaaS)

    SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.

    Resources

    Support

    Vendor support

    For sales, contracting and usage inquires, please email aws@deepgram.com 

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Product comparison

     Info
    Updated weekly

    Accolades

     Info
    Top
    10
    In Scheduling & Coordination, Speech Recognition, Sales & Marketing
    Top
    10
    In Speech to Text, Customer Support, Speech Recognition
    Top
    100
    In Natural Language Processing

    Customer reviews

     Info
    Sentiment is AI generated from actual customer reviews on AWS and G2
    Reviews
    Functionality
    Ease of use
    Customer service
    Cost effectiveness
    1 reviews
    Insufficient data
    Insufficient data
    Insufficient data
    Insufficient data
    0 reviews
    Insufficient data
    Insufficient data
    Insufficient data
    Insufficient data
    Positive reviews
    Mixed reviews
    Negative reviews

    Overview

     Info
    AI generated from product descriptions
    Speech Recognition Speed
    Real-time transcription with 20x faster processing, capable of transcribing an hour of audio in approximately 12 seconds
    Latency Performance
    Ultra-low latency under 300 milliseconds for near-instantaneous speech-to-text conversion
    Accuracy Metrics
    Over 90% transcription accuracy across multiple use case categories
    Language Understanding Capabilities
    Advanced natural language processing features including summarization, sentiment analysis, speaker diarization, and language detection
    Model Customization
    Support for customer-specific custom model training to align with unique business requirements
    Speech Recognition
    Advanced multilingual speech recognition with high accuracy and low word error rates
    Language Processing
    Support for 99+ languages with automatic language detection and custom vocabulary capabilities
    Audio Intelligence
    Comprehensive suite of AI models including speaker diarization, sentiment analysis, content moderation, and PII redaction
    Large Language Model Integration
    LeMUR framework for processing audio transcripts using advanced language model capabilities
    Transcription Flexibility
    Support for async and real-time transcription with multiple file type compatibility across 33 audio and video formats
    Natural Language Understanding
    Advanced proprietary Large Language Model (ConveRT) trained specifically for customer service applications
    Speech Recognition Technology
    Spoken language understanding system capable of processing diverse accents, dialects, and background noise
    Conversational AI Architecture
    Customer-led conversational assistant platform enabling natural language interaction with interruption and topic flexibility
    Language Processing Capability
    Multi-language support with ability to understand and respond across different linguistic contexts
    Dialogue Management
    Customizable conversational assistant deployment with continuous improvement through expert dialogue systems scientists and machine learning developers

    Contract

     Info
    Standard contract
    No
    No
    No

    Customer reviews

    Ratings and reviews

     Info
    4.5
    1 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    100%
    0%
    0%
    0%
    1 AWS reviews
    |
    4 external reviews
    Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.
    Umar Ijaz

    Handles large data, good documentation is available and powerful model

    Reviewed on Jun 27, 2024
    Review provided by PeerSpot

    What is our primary use case?

    I use Deepgram  for audio transcriptions and speech recognition. I am working on a feedback survey app where users provide verbal feedback that Deepgram  transcribes into text. 

    We receive the results and implement features like punctuation and Smart Format.

    How has it helped my organization?

    Deepgram has significantly improved our transcription process in terms of speed and accuracy. It has allowed us to efficiently convert verbal feedback into text, enabling quicker analysis and implementation of new features. 

    Integrating Deepgram has streamlined our workflow, enhancing productivity and delivering more accurate transcription results.

    What is most valuable?

    We previously used IBM Watson, which was slow and had limitations in accurately transcribing words. After evaluating OpenAI's Whisper model, we discovered Deepgram, which incorporates Whisper and adds the powerful Nova model.

    Deepgram's latency is impressively low, around 0.5 to 1 second, making it a superior choice.

    What needs improvement?

    Live transcription could be improved. Sometimes, Deepgram's WebSocket is disposed of due to redundancy issues. Enhanced stability in live transcription would be beneficial.

    For how long have I used the solution?

    I have been using Deepgram for one and a half years.

    What do I think about the stability of the solution?

    Initially, we encountered some stability issues, but Deepgram has since improved its architecture. With the addition of hooks for status updates, the accuracy has improved to approximately 90 to 95%, which is better than other models we've tested.

    What do I think about the scalability of the solution?

    It's scalable. Our platform handles 50 to 60 users simultaneously without compromising accuracy. For instance, a 20-minute audio file was transcribed within a second, demonstrating its ability to handle large volumes of audio data effectively.

    How are customer service and support?

    My experience with customer service and support has been positive. They are responsive and helpful, and they provide timely resolutions to any issues.

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    We previously used IBM Watson, but it didn't deliver appropriate results. We searched for alternatives and found OpenAI's Whisper model, which was initially slow. After thorough analysis, we discovered Deepgram. It proved to be superior, leading to our decision to migrate. We used a detailed spreadsheet to compare various models before making the switch.

    How was the initial setup?

    Thanks to clear documentation, the initial setup was very easy. If you have prerequisite knowledge of the programming language you're using, it’s straightforward to follow the documentation and implement it into your system. When I started, I closely followed the documentation, which made the process very manageable.

    Deployment model: We last deployed it on the Google Cloud  Platform (GCP).

    What about the implementation team?

    The implementation was done in-house.

    What was our ROI?

    Our ROI has increased due to enhanced transcription accuracy and speed, leading to more efficient workflows and better user satisfaction.

    What's my experience with pricing, setup cost, and licensing?

    The pricing is moderate. While live transcription may incur some charges when the connection is open, they become minimal over time. So, it's a balanced option—neither cheap nor overly expensive.

    Which other solutions did I evaluate?

    Yes, besides IBM Watson, we evaluated OpenAI's Whisper model.

    What other advice do I have?

    Deepgram is highly recommended. Users don’t need to do anything special before using it, as the documentation is comprehensive. I am a Node.js developer and have used Deepgram packages for Node.js. Understanding your programming language is key, whether it's Node.js, Python, or others.

    AI Features:

    I have integrated various AI models into our application. Deepgram's sentiment analysis feature allows us to create graphs and analyses to determine if words are positive, negative, or neutral. This helps us summarize feedback and derive actionable insights.

    My ratings:

    I would rate it an eight out of ten. The live transcription feature needs improvement as the WebSocket sometimes gives errors or breaks down during live streams.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Google
    Arslan Rasheed

    Used for TTS (Text-to-Speech) and STT (Speech-to-Text) purposes

    Reviewed on Jun 25, 2024
    Review provided by PeerSpot

    What is our primary use case?

    We use the solution for TTS (Text-to-Speech) and STT (Speech-to-Text) purposes.

    What is most valuable?

    The solution's Speech-to-Text conversion feature is really awesome.

    What needs improvement?

    Deepgram is currently restricted to only the English variants, but it should include other languages, such as German or French.

    For how long have I used the solution?

    I have been using Deepgram for five to six months.

    What do I think about the stability of the solution?

    Deepgram is a stable solution.

    What do I think about the scalability of the solution?

    The Deepgram cloud can handle large volumes of audio data. Around three to four people use the solution in our organization.

    How was the initial setup?

    The solution’s initial setup is easy.

    What's my experience with pricing, setup cost, and licensing?

    Deepgram is a cheap solution. We can create an account for $200, which we can initially use for the Deepgram services.

    What other advice do I have?

    I have used Deepgram with Twilio for the calling system. I would recommend Deepgram to users who want to use it for speech-to-text purposes.

    Overall, I rate the solution an eight out of ten.

    Boris Morozov

    Offers great speed during transcription compared to other tools

    Reviewed on Jun 24, 2024
    Review provided by PeerSpot

    What is our primary use case?

    I am the software development team lead in my company, and we are creating a speech recognition product based on a few different engines. The company works in the legal process, so my team generates a machine-based transcript and then converts it to a readable format. It saves time for the person using the machine-based transcript as a starting point.

    What is most valuable?

    The solution's most valuable feature is its speed of transcription. It is one of the fastest tools, especially if you compare it to the second fastest solution that you can get, which is 20 times faster. Thus, it is not just a marginally faster product.

    What needs improvement?

    In comparison to Deepgram, I would say that the transcript accuracy offered by other products is much higher. In our company, we had five jobs, and each job was generated by two different engines. One of the aforementioned engines was Deepgram, and the transcribers had two versions, and the users didn't know which one was which version. After the aforementioned process was carried out, the users had to choose which version they thought was less work to convert to the final or the perfect legally binding document since the tool was getting paid per page. Deepgram had a clear incentive to do some little work as fast as possible and to get a particular amount of money. In our company, we didn't have a huge sample size because we are a small company. We did five jobs with five different transcribers, and each job had two versions. A blind test was done, and we found that the other tools were marginally more accurate than Deepgram.

    I would like it to be more accurate. I can get maintainability and faster transcripts with the perfect features with an improved tool.

    For how long have I used the solution?

    I have been using Deepgram for a year. I use the solution's latest version. I am an end-user of the tool.

    What do I think about the stability of the solution?

    I don't use all of the features in the tool because Deepgram adds new features every few months. The features that I have been using in the tool have been very stable. I have never had any issues with the tool and it has never crashed. In our company, we just update to the latest version, and see if there are no issues until the next update.

    What do I think about the scalability of the solution?

    I use the tool as a software developer and transcriber. Though I don't know the number of transcribers that work in our company , I can say that it runs into hundreds.

    How are customer service and support?

    I contacted the solution's technical support for help, and I got a nice, decent service and had no complaints at all. At the moment, whenever you want to update Deepgram by yourself, it is a very easy process. You just get the version of the tool and pull it from Docker. If you want to update the model, you have to contact the support anyway. I haven't contacted the support team very often. I might have contacted the product's support team three times just to update the new versions of the model. I got decent support from the tool's support team.

    Which solution did I use previously and why did I switch?

    My company still uses multiple products in parallel to Deepgram. For example, in transcript-related business, some clients want to get the transcript super fast or in a few hours. You need to produce the machine transcript in a few minutes and give it to the transcriber, who should start working immediately on it. In such cases, there is a huge improvement when one uses Deepgram, which is a major advantage.

    How was the initial setup?

    If I consider and compare the other engines I have used with Deepgram, I would say that the ease of installation is one of the strong points of the product. Compared to all other engines, the installation of Deepgram has been simpler and far more stable. It just gets updated, and it runs properly. The good thing is that the tool is modular. In the tool, the modules and the engine itself are separate objects. If you want to update only the module, then there is no need to redeploy anything since most other engines that we have in our company are on an on-premises model.

    The solution is deployed on on-premises and cloud environments. It is deployed in the private section of our company's cloud. We aren't using the API and prefer to use our own deployment.

    What's my experience with pricing, setup cost, and licensing?

    When using Deepgram, one needs to pay for the hours or minutes for which the transcription is needed. The more hours you commit to in advance, the cheaper the price. It is slightly cheaper than the other engines we used. You should take into account that you usually pay for cloud resources or even if you are doing just an on-premises deployment, and since it is a fast process, theoretically, you can save twice, like, once on the money you pay for usage and the second time you pay for cloud resources because if you can, like, finish transcribing in a minute instead of an hour, you can know, if your pipeline scales down, you will only pay for that minute.

    What other advice do I have?

    Whether you should use the tool or not depends on your use cases and the main use cases where it is used. Based on the engines and factors like maintainability or how easy it is to maintain, and if you consider them to be your priorities, you definitely go with Deepgram.

    Speaking about how Deepgram handles large volumes of audio data without compromising inaccuracy, it is always a trade-off. As I said, Deepgram is not as accurate as the other engines we're using, but the difference is marginal, and if speed is more important to you, you should go with Deepgram. I think if the accuracy of the transcript is far more important, other engines give better results.

    For an end user, the tool offers on-premises deployment, and it also has an API. Using API is super easy since you just log in to the site and create an account, and you can start using it. If you want to deploy it on an on-premises model, you need to have a basic understanding of the cloud and how it works. The tool has a step-by-step guide on its website, which is very nice, and I still use it even because it is super simple and can be easily understood.

    We haven't actually used our AI features at the moment because we're not quite sure how we can use them in our company because we are working on the legal transcript, and those have to be developed word by word even if the person who speaks, says certain things incorrectly. We have to maintain 100 percent accuracy. I am not sure how we can apply AI features in the tool, but we are always looking at the AI aspect in our company.

    Deepgram offers some clear advantages over other applications. If you want a tool that produces transcripts in a very fast manner, there is nothing that comes even close to being disputed with Deepgram.

    I rate the tool an eight or nine out of ten.

    Which deployment model are you using for this solution?

    Hybrid Cloud
    Ariel Lindenfeld

    Excellent quality, great speech-to-text recognition, and responsive support

    Reviewed on May 30, 2024
    Review from a verified AWS customer

    What is our primary use case?

    We primarily use the solution for transcribing speech to text. We use it to record phone calls and meetings and then transcribe them.

    What is most valuable?

    The quality of the product is the most valuable aspect. The recognition of industry-specific terminology phrases and abbreviations is really important for us. We were able to get a good level of industry specificity with Deepgram.

    What needs improvement?

    Two things come to mind for improvement. Maybe they have fixed these, or maybe there is something new, and we haven't implemented it yet. 

    One improvement could be dual-channel audio. We've had issues in the past where it generates the transcript, and a lot of the text is duplicated. I understand why it would happen. It's an audio file with more than one channel of the same speaker, which is what may cause the duplicated text. That said, it would be great either to have a way for Deepgram to realize that it's basically the same audio on two channels and only transcribe one of them or at least give us a warning that it's happening. We've found workarounds, however, a better solution from Deepgram's side would be great. 

    The other issue comes up when some changes are made on their end, and we want to test them. We've had one to two instances where they tell us that we have access, and we try to test something out, and it turns out we don't. When that happens, then they have to fix something on their end. It's not a big deal. We have a Slack channel with them where we can quickly touch base. We let them know, and they will get back to us and fix the access. It's not something we're doing very often. 

    For how long have I used the solution?

    We've been using the solution for about a year and a half.

    What do I think about the stability of the solution?

    The solution is stable. We've had no real issues. The only issues that come up more in the testing space are when we're looking at a new model that we want to test out. The issue in that case is more about access where we think we have it and then we don't. That's not a stability issue, however. 

    What do I think about the scalability of the solution?

    The scalability is significant. We have grown significantly with it and I don't see any reason why we can't keep growing with it. It doesn't require many people to have it up and running. The output that we get from Deepgram is used by a large number of people. Day to day, it's really very hands-off. 

    How are customer service and support?

    Either before we set up the contract with Deepgram, or right after, they set up a Slack channel with us that had a bunch of people from their end - including customer service and their operations team. We had a call or two with them to get started, and we have been able to quickly correspond with them to get answers or send them details via Slack. That's been very helpful. Even now, we can jump onto the channel and send them questions. They send out updates via email as well.

    I can't say if having a Slack channel with them would be considered an extra cost or if it is available to any customer or not. Maybe if you have a premium program or custom model, it's definitely included. 

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    Previously, we were using human transcription, which was a lot more expensive.

    How was the initial setup?

    The initial setup isn't any different from any other speech-to-text solution out there. My understanding is that it wouldn't be extremely difficult. 

    It was implemented by one of our developers in-house and I was involved as well. The Deepgram team was very supportive. It depends on the use case, however, in our case, it doesn't require a big team to set up or manage. 

    What about the implementation team?

    We had our in-house development team involved to get it up and running.

    What was our ROI?

    The ROI has been excellent. The cost is night and day compared to the cost of human transcription. We're spending maybe a tenth of the cost we would if we were still doing manual transcriptions.

    What's my experience with pricing, setup cost, and licensing?

    The pricing was very good. Although the competitors also would have saved us a lot of money, we were mainly looking for the right level of quality of the transcript. 

    Which other solutions did I evaluate?

    We compared several other automated transcription services similar to Deepgram a little over two years ago. We're constantly testing out others. We find Deepgram to be the highest quality. That was really important to us.

    Amazon AWS has some sort of solution that we tested out. I don't remember the names of the others. We might have tested something from Google and some other lesser-known options.

    Being able to access it and send recordings was the same level of work for our development team. So it came to the quality and the ability to build a custom model. We gave them some data and some audio file samples that they used in order to come up with a custom model that would perform even better for us. 

    What other advice do I have?

    We're a Deepgram customer. 

    We use Deepgram for speech-to-text. There are different models you can use in order to do that. We have a custom model and we have also tested out some of their other models as well. 

    We're really happy with it. Technology in this space is constantly changing, and we see what's happening with ChatGPT, for example. Everyone uses different kinds of platforms for transcription. Deepgram also has a lot of other solutions we're not using, and it supports other languages, which is not how we're using it. However, it will likely continue to improve itself and align with all of the advances in the space. Deepgram has been really great for our use case. 

    I interact with Deepgram's team and the team internally that has implemented it. I don't have hands-on experience with it myself. I continue to review the quality of the transcripts, which is what we're using it for. That said, my understanding is that the ease of use would depend on your experience and your skill level. It is in line with everything else out there. 

    I'd recommend the solution to others. I'd rate it nine out of ten. We evaluated and compared different solutions based on the quality of speech-to-text and implementation. For quality, Deepgram definitely came out above everything else. The implementation also went well. Support has been good - not that we've needed it too often. Compared to what we had before, which wasn't a technological solution, the difference is night and day. The other automated transcription services we've looked at in the past few years just weren't good enough. Deepgram really offers great quality. 

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Khemit Verma

    Used to transcribe videos, but does not properly identify the number of speakers

    Reviewed on May 24, 2024
    Review provided by PeerSpot

    What is our primary use case?

    I run Deepgram on my local system to transcribe videos.

    What is most valuable?

    The speed of the solution for transcribing videos is good.

    What needs improvement?

    I need to transcribe my videos to text chat, but there are some issues when I run Deepgram. The solution does not properly identify the number of speakers. For example, Deepgram only identifies two speakers out of three or four speakers in some videos.

    The solution also makes some spelling and English grammar mistakes. Deepgram does not properly identify some specific words in a sentence.

    For how long have I used the solution?

    I have been using Deepgram for one to two months.

    What do I think about the stability of the solution?

    We haven't faced any breakdowns or bugs with the solution.

    What do I think about the scalability of the solution?

    My team consists of two members who use Deepgram.

    How are customer service and support?

    The solution’s technical support is average. I talked to the technical support team regarding an issue where the solution couldn't identify the exact number of speakers. The support team asked me to use certain parameters, but the results were inaccurate. I used all the parameters suggested by the support team, but the speakers were still not identified clearly. However, other services could properly identify the speakers of the videos.

    What's my experience with pricing, setup cost, and licensing?

    The solution’s pricing is cheap.

    What other advice do I have?

    I chose to use Deepgram after researching it on Google and finding some good feedback that the solution had good APIs. It's easy for a new user to learn to use the solution.

    I would not recommend Deepgram to other users because it does not properly identify video communication. If you compare it with the other APIs, you can easily find that they do not properly identify some words, exclamatory signs, full stops, etc. These are small mistakes the tool is not properly identifying. Also, the solution does not properly identify the speakers. Users should check other APIs before choosing Deepgram.

    Overall, I rate the solution a six out of ten.

    View all reviews