Overview
MiniMax Speech model is a cutting-edge AI speech model that excels in voice cloning and text-to-speech (TTS) synthesis.
Ranked #1 on the Artificial Analyze Text-to-Speech leaderboard, delivering industry-leading speech quality and intelligibility. https://artificialanalysis.ai/text-to-speech/arena?tab=leaderboardÂ
Key Capabilities
Advanced Voice Cloning: Accurately replicates unique timbre, intonation, and speaking style
High-Quality Output: Produces natural-sounding speech with exceptional fidelity
Multilingual Support:
- Speech-02 model supports 30+ languages with diverse accents and emotional expressions
- Speech-2.5 model support 50+ language with diverse accents and emotional expressions
Versatile Styles: Seamlessly switches between formal, casual, and expressive tones Core Innovation
Our breakthrough Intrinsic Zero Shot Text-to-Speech with Learnable Speaker Encoder enables:
Seamless cooperation between voice style and content generation Virtually unlimited combinations of language, accent, and voice Enhanced synthesis quality through unified AR Transformer architecture
Applications
Perfect for audio production, virtual assistants, call centers, content creation, and media localization. Generate customizable voice content at scale while reducing production costs.
Resources
Technical Report: https://minimax-ai.github.io/tts_tech_report/Â
Experience the Technology: https://www.minimax.io/audioÂ
Highlights
- Ranked #1 on the Artificial Analyze Text-to-Speech leaderboard, delivering industry-leading speech quality and intelligibility.
- High-fidelity voice cloning and text-to-speech synthesis capable of replicating unique timbres and expressive speech.
- Supports multi-style, customizable voice generation for applications in audio production, virtual assistants, call centers, and content creation.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/month |
---|---|---|
Starter | The MiniMax speech-01/02 series supports up to 10 requests per min (RPM = 10). Each account includes 100,000 TTS character credits for speech synthesis and allows up to 10 voice slots for different voice profiles or styles. This setup is ideal for small to medium-scale speech generation, providing stable performance and flexible voice customization. | $5.00 |
Standard | The MiniMax speech-01/02/2.5 series supports up to 50 requests per minute (RPM = 50). Each account includes 300,000 TTS character credits for speech synthesis and provides up to 100 voice slots to manage different voice profiles or styles. This configuration is suitable for medium to large-scale speech generation, offering higher throughput and greater flexibility for voice customization. | $30.00 |
Pro | The MiniMax speech-01/02/2.5 series supports up to 200 requests per minute (RPM = 200). Each account includes 1,100,000 TTS character credits for speech synthesis and offers up to 250 voice slots for managing diverse voice profiles and styles. This configuration is designed for large-scale or high-demand speech generation, delivering high throughput and extensive flexibility for advanced voice customization. | $99.00 |
Scale | The MiniMax speech-01/02/2.5 series supports up to 500 requests per minute (RPM = 500). Each account includes 3,300,000 TTS character credits for speech synthesis and provides up to 500 voice slots for managing various voice profiles and styles. This configuration is optimized for enterprise-level or large-scale speech generation, offering very high throughput and extensive flexibility for complex voice customization needs. | $249.00 |
Business | The MiniMax speech-01/02/2.5 series under the Business plan supports up to 800 requests per minute (RPM = 800). Each account includes 20,000,000 TTS character credits for speech synthesis and allows up to 800 voice slots for managing a wide range of custom voice profiles and styles. This configuration is built for enterprise-scale and high-volume applications, delivering exceptional throughput, stability, and flexibility for advanced voice generation needs. | $999.00 |
Customer Pricing | This plan offers priority access to model updates, unlimited requests per minute (RPM), and exclusive guarantees for security and stability. Users also gain more voice options and enhanced voice cloning capabilities, making it ideal for enterprises or customers with the highest demands for scalability, customization, and reliability. | $0.01 |
Vendor refund policy
Not currently supported.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Support
Vendor support
Email:api@hailuoai.com Description: Customers receive 24x7 email support with a guaranteed response within 24 hours. Technical assistance includes setup guidance, troubleshooting, and usage tips for MiniMax Speech-02.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.