How was this content?
AI has found its voice, and startups are listening: How disruptors can capitalize on one of 2025’s top emerging markets.
Reasoning models have dominated discourse around AI in recent years, but 2025 has seen a new modality step into the spotlight—voice. It wasn’t long ago that the concept of interfacing with technology through speech was reserved for science fiction novels rather than real life—back then, your phone was something you spoke on, not to. Things began to change in the 2010s with the success of products like Amazon Alexa, Siri, Google Assistant, and Bixby, which helped large parts of the general population get comfortable chatting with their devices.
Fast-forward to the AI space race we’re living in today, and improvements in performance and latency have enabled the potential applications of voice AI to skyrocket. From call centers, to hospitality, healthcare, and language learning—new possibilities seem to present themselves by the day. That sort of thing doesn’t go unnoticed, and a flywheel of investor interest, startup innovation, and changing consumer behaviors have left voice AI primed to make some serious vibrations in the months and years ahead.
Conversational AI—a trend worth talking about
People used to believe that telephones attracted evil spirits. They got over it—and now we all carry one around with us in our pockets. Similarly, the growing popularity of at-home and on-device voice assistants has helped to normalize voice interaction with technology and even shape consumer behavior—particularly among younger generations. According to VML’s Future Shopper Report, 46 percent of global consumers said that they owned a smart assistant in 2023, rising to 49 percent in 2024. 23 percent of global consumers say that they regularly use voice-activated smart assistants to make purchases, with a further 19 percent having used them to order products in the past.
Alongside shifting consumer habits, recent advances in the core technologies underpinning voice are opening the door to future value creation. 2024 marked a breakthrough in orchestrated speech systems combining speech-to-text, large language models, and text-to-speech to listen, reason, and respond in human-like conversation, but that was only the start. Dedicated speech-to-speech models capable of bypassing the need for text representation typical of traditional voice AI pipelines entered the market—think ChatGPT’s voice mode. Couple this with the rise of agentic AI more generally, and voice has rapidly moved beyond a user novelty into a viable modality for enterprise-grade solutions.
Startup activity that speaks volumes
As the voice AI market continues to expand, startups are rushing to stake their claim. In Y Combinator alone, the share of each batch building with voice technology has grown from 13 percent in W24, to 14 percent in S24, to 22 percent in F24. Disruptors intent on capturing the Voice AI opportunity need to tune in to what investors are looking for.
According to Andreesson Horowitz (a16z), investors—and customers—are primarily interested in voice AI solutions that target industries where the phone is typically used for customer demos, is more effective due to regulations, or provides a higher success rate than alternative ways of engaging with customers. High-value industries include logistics, debt collection, and healthcare, with enterprises prioritizing solutions that can provide clear, measurable outcomes. They also expect impressive ROI (we’re talking 30-50 percent cost reduction) as well as seamless integration with existing systems like Voice over Internet Protocol (VoIP).
Disruptors developing voice AI solutions need to address multiple challenges to attract investment and create market differentiation. Voice assistants collect and process personal data, and businesses (and their customers) won’t compromise on privacy and regulatory compliance. Competition is also intensifying, and startups need to focus on differentiating themselves in an increasingly crowded market. That means competing with an influx of both horizontal and vertical-focused voice AI products, as well as developer platforms that enable internal teams to build their own voice agents. This makes go-to-market speed critical.
Agentic voice solutions can scale rapidly once implemented, but disruptors may need to navigate hurdles when dealing with more traditional enterprises—this is where measurable outcomes and impressive ROI really matter. There’s also the question of monetization. As a16z notes, most voice products were initially priced per minute. However, as the cost of underlying models has decreased, competitors have started to undercut one another. Going forward, monetization strategies are likely to combine platform fees with usage-based components. To overcome these challenges, startups need to work collaboratively with reliable technology partners.
Over two decades of pioneering voice AI technology
Amazon Web Services (AWS) has a long track record of innovation in voice AI, starting with the launch of Amazon Alexa in 2014, which helped pioneer mainstream voice interaction. Since then, AWS has continuously advanced the space with technologies like Amazon Transcribe, Amazon Polly, and Amazon Lex. 2024 saw the launch of Amazon Alexa+, integrating generative AI to enable more natural, contextual conversations.
Today, AWS offers cutting-edge models like Amazon Nova Sonic, now available in Amazon Bedrock, that push the boundaries of real-time, human-like voice interactions. Amazon Nova Sonic can be utilized across a broad set of applications, including customer support call automation, outbound marketing, voice-enabled personal assistants and agents, as well as interactive education and language learning. AWS also offers cost-saving silicon built for AI workloads with AWS Trainium and AWS Inferentia chips.
A springboard for future market leaders
Beyond technology, AWS also provides strategic expertise and programs to help startups build faster and smarter. For example, the AWS Generative AI Accelerator program is designed to support and propel the next wave of AI disruptors with the opportunity to learn from program partners such as NVIDIA and Mistral AI. The 10-week program forms part of a broader $230 million USD commitment by AWS to help startups rapidly develop generative AI applications globally. Participating startups can receive up to $1 million USD, in addition to technical and commercial guidance, and access to millions of active customers through AWS Marketplace.
The AWS Generative AI Accelerator has already helped innovative startups become leaders in the voice AI space. Take Cartesia, a voice AI platform provider specializing in real-time, multimodal intelligence built using the company’s State Space Models (SSMs), a breakthrough AI architecture originally pioneered by its founding team during their PhD studies at Stanford.
Today, Cartesia is recognized for its industry-leading enterprise text-to-speech model for real-time conversations, delivering human-quality voice generation with just 40 millisecond latency. The company’s flagship model, Sonic, is two-three times faster than alternatives and enables businesses to deploy ultra-realistic voice agents across any industry with perfect accuracy on complex phrases.
Now we’re talking
The voice AI market is expanding at pace and competition is heating up. Going forward, we’re going to see a proliferation of new speech-to-speech model APIs and voice agent platforms from multiple providers, as well as increasing trust in voice agents’ ability to complete complex, multi-step tasks across all verticals. Customer and investor expectations are high, but with the right strategy and support, startups have a lot to gain in the voice AI space. Partnering with AWS can help disruptors build voice AI solutions with cutting-edge technology, on infrastructure built for AI workloads. Programs like the AWS Generative AI Accelerator can also give startups access to proven expertise and—crucially—reduce time-to-market while extending customer reach.
How was this content?