Automatic Speech Recognition

Info

Sold by: HARMAN Digital Transformation Solutions

Enhanced voice experience with accurate, real-time, scalable speech recognition

View purchase options

Overview

Try agent mode

Create proposal

Ask question

This solution uses state-of-the-art transformer-based models, performing speech processing for audio transcription. Designed with enterprise scalability in mind, it caters to the needs of businesses of all sizes. This solution ensures fast and accurate transcriptions, while optimizing resource utilization on CPU instances. This cost-effective approach empowers you to focus on what matters most – analyzing the wealth of information hidden within your audio recordings. This solution has the capability to handle multiple languages, facilitating seamless communication and understanding across diverse linguistic landscapes. It provides both batch and real-time inference capabilities. The batch mode enables efficient processing of extensive audio datasets, ensuring timely delivery of transcriptions for further analysis. Meanwhile, the real-time inference feature provides instantaneous transformation of spoken words into written text, enabling immediate access to vital information.

Highlights

eNova Speech Recognition model is trained for speech recognition and transcription tasks, capable of transcribing speech audio into the text in the language it is spoken. This model version is tuned for CPU uses and best suitable for short audio segments. **Supported Langauge:** * 'en_us': 'English', * 'de_de': 'German', * 'es_419': 'Spanish', * 'ko_kr': 'Korean' **Supported Tasks:** * 'transcribe_srt': Transcription With srt. * 'transcribe': Transcription Task
The solution can be used in industries like media and entertainment, software, mobile applications, hospitatlity, healthcare, legal etc. to provide transcription and closed captioning. This can also be used to develop many solutions requiring speech to text like voice bots and virtual assistants.
Need more machine learning, deep learning, NLP and Quantum Computing solutions. Reach out to us at Harman DTS.

Details

Sold by

HARMAN Digital Transformation Solutions

Introducing multi-product solutions

You can now purchase comprehensive solutions tailored to use cases and industries.

Learn more

Explore multi-product solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Automatic Speech Recognition

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (19)

Info

Dimension	Description	Cost/host/hour
ml.m4.16xlarge Inference (Batch) Recommended	Model inference on the ml.m4.16xlarge instance type, batch mode	$40.00
ml.m5.2xlarge Inference (Real-Time) Recommended	Model inference on the ml.m5.2xlarge instance type, real-time mode	$10.00
ml.c5.18xlarge Inference (Batch)	Model inference on the ml.c5.18xlarge instance type, batch mode	$40.00
ml.c5.xlarge Inference (Batch)	Model inference on the ml.c5.xlarge instance type, batch mode	$40.00
ml.m4.2xlarge Inference (Batch)	Model inference on the ml.m4.2xlarge instance type, batch mode	$40.00
ml.m5.4xlarge Inference (Real-Time)	Model inference on the ml.m5.4xlarge instance type, real-time mode	$10.00
ml.m5.12xlarge Inference (Real-Time)	Model inference on the ml.m5.12xlarge instance type, real-time mode	$10.00
ml.g5.xlarge Inference (Real-Time)	Model inference on the ml.g5.xlarge instance type, real-time mode	$10.00
ml.c5.large Inference (Real-Time)	Model inference on the ml.c5.large instance type, real-time mode	$10.00
ml.m5.xlarge Inference (Real-Time)	Model inference on the ml.m5.xlarge instance type, real-time mode	$10.00

Vendor refund policy

We do not provide any usage related refunds at this time.

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

Amazon SageMaker model

An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

Deploy the model on Amazon SageMaker AI using the following options:

Real-time inference

Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference .

Batch transform

Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI .

Version release notes

Bug fixes and enhancements

Additional details

Inputs

Summary: Model input is a json request with the following payload

'wv_data' = base64 encoded audio

'task' = whether to perform a plain transcribed text "transcribe" or a subtitles file (srt format) "transcribe_srt"

'lang' = language of transcription ('en_us': 'English', 'de_de': 'German', 'es_419': 'Spanish', 'ko_kr': 'Korean')

Limitations for input type: * The input audio data should consist of complete audio files, rather than raw PCM data (wav, flac, RAW). * The input audio sample should be 16000 Hz or above. * The maximum audio file size for realtime inference is 20MB and for batch transform 50MB each file.

Input MIME type: audio/x-wav, application/json

Real-time inference sample input data

https://github.com/HDTS-user/lifeware-speech-recognition/tree/main/sample_input

Batch transform sample input data

https://github.com/HDTS-user/lifeware-speech-recognition/tree/main/sample_input

Input data descriptions

The following table describes supported input data fields for real-time inference and batch transform.

Field name	Description	Constraints	Required
'wv_data'	'wv_data' = base64 encoded audio (wav, flac, RAW) 'task' = "transcribe" or "transcribe_srt" 'lang' = 'en_us': 'English', 'de_de': 'German', 'es_419': 'Spanish' or 'ko_kr': 'Korean'	Type: FreeText	Yes
'task'	'wv_data' = base64 encoded audio (wav, flac, RAW) 'task' = "transcribe" or "transcribe_srt" 'lang' = 'en_us': 'English', 'de_de': 'German', 'es_419': 'Spanish' or 'ko_kr': 'Korean'	Type: FreeText	Yes
'lang'	'wv_data' = base64 encoded audio (wav, flac, RAW) 'task' = "transcribe" or "transcribe_srt" 'lang' = 'en_us': 'English', 'de_de': 'German', 'es_419': 'Spanish' or 'ko_kr': 'Korean'	Type: FreeText	Yes

Resources

Vendor resources

Harman DTS website

eNova - Harman's Conversational AI platform

eNova use-cases

Support

Vendor support

Business hours email support marketplaceSupp@harman.com

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

Meetrix transcriber with API: Audio/Video-to-Text conversion

By Meetrix.io

Accurately convert your audio or video files into text or subtitles in seconds with embeddable Meetrix transcriber software. We support transcription in dozens of languages and dialects from around the world. Use advanced speech recognition with seamless integration through an easy-to-use API. The perfect low cost alternative for Amazon Transcribe.

View product

NVIDIA ParakeetvTDT 0.6B v2

By NVIDIA

Parakeet-tdt-0.6b-v2 is a 600-million-parameter automatic speech recognition (ASR) model designed for high-quality English transcription, featuring support for punctuation, capitalization, and accurate timestamp prediction.

View product

Vulavula Transcribe

By Lelapa AI

Transcribe and interpret spoken language across African languages with Lelapa AI's Vulavula Multilingual ASR Model.

View product

Theheartbeat by IOanyT Innovations

By IOanyT Innovations, Inc.

The Heartbeat by IOanyT Innovations is a cutting-edge AI/ML-powered Call Analytics solution designed to produce exceptionally precise call transcripts and extract valuable conversation insights. Its primary objective is to enhance customer experience and boost agent productivity, providing businesses with actionable information for making informed decisions.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.