Build real-time conversational AI experiences using Amazon Nova Sonic and LiveKit

The rapid growth of generative AI technology has been a catalyst for business productivity growth, creating new opportunities for greater efficiency, enhanced customer service experiences, and more successful customer outcomes. Today’s generative AI advances are helping existing technologies achieve their long-promised potential. For example, voice-first applications have been gaining traction across industries for years—from customer service to education to personal voice assistants and agents. But early versions of this technology struggled to interpret human speech or mimic real conversation. Building real-time, natural-sounding, low-latency voice AI has until recently remained complex, especially when working with streaming infrastructure and speech foundation models (FMs).

The rapid progress of conversational AI technology has led to the development of powerful models that address the historical challenges of traditional voice-first applications. Amazon Nova Sonic is a state-of-the-art speech-to-speech FM designed to build real-time conversational AI applications in Amazon Bedrock. This model offers industry-leading price-performance and low latency. The Amazon Nova Sonic architecture unifies speech understanding and generation into a single model, to enable real, human-like voice conversations in AI applications.

Amazon Nova Sonic accommodates the breadth and richness of human language. It can understand speech in different speaking styles and generate speech in expressive voices, including both masculine-sounding and feminine-sounding voices. Amazon Nova Sonic can also adapt the patterns of stress, intonation, and style of the generated speech response to align with the context and content of the speech input. Additionally, Amazon Nova Sonic supports function calling and knowledge grounding with enterprise data using Retrieval-Augmented Generation (RAG). To further simplify the process of getting the most from this technology, Amazon Nova Sonic is now integrated with LiveKit Agents, a widely used framework that enables developers to build real-time audio, video, and data communication applications. This integration makes it possible for developers to build conversational voice interfaces without needing to manage complex audio pipelines or signaling protocols. In this post, we explain how this integration works, how it addresses the historical challenges of voice-first applications, and some initial steps to start using this solution.

Solution overview

LiveKit is an open source platform for building voice, video, and physical AI applications that can see, hear, and speak. Designed as a full-stack solution, it offers client SDKs across web, mobile, and backend systems, a WebRTC server for low-latency network transport, and built-in features such as turn detection, agent load balancing, telephony, and third-party integrations. Developers can build robust agent workflows for real-time applications without worrying about the underlying infrastructure, whether self-hosted, deployed to AWS, or running on LiveKit’s cloud service.

Building real-time, voice-first AI apps requires managing complex infrastructure—from audio capture and streaming to signaling, routing, and latency optimization—especially when using bidirectional models like Amazon Nova Sonic. To simplify this, we integrated a Nova Sonic plugin into LiveKit’s Agents framework, eliminating the need to manage custom pipelines or transport layers. LiveKit handles real-time audio routing and session management, while Nova Sonic provides speech understanding and generation. Developers get features like full-duplex audio, voice activity detection, and noise suppression out of the box, so they can focus on designing a great user experience for their AI voice applications.

The following video shows Amazon Nova Sonic and LiveKit in action. You can find the code for this example in the LiveKit Examples GitHub repo.

The following diagram illustrates the solution architecture of Amazon Nova Sonic deployed as a voice agent in the LiveKit framework on AWS.

Diagram illustrates the solution architecture of Amazon Nova Sonic

Prerequisites

To implement the solution, you must have the following prerequisites:

Python version 3.12 or higher
An AWS account with appropriate Identity and Access Management (IAM) permissions for Amazon Bedrock
Access to Amazon Nova Sonic on Amazon Bedrock
A web browser (such as Google Chrome or Mozilla Firefox) with WebRTC support

Deploy the solution

Complete the following steps to get started talking to Amazon Nova Sonic through LiveKit:

Install the necessary dependencies:

brew install livekit livekit-cli
curl -LsSf https://astral.sh/uv/install.sh | sh

uv is a fast, drop-in replacement for pip, used in the LiveKit Agents SDK (you can also choose to use pip).

Set up a new local virtual environment:

uv init sonic_demo
cd sonic_demo
uv venv --python 3.12
uv add livekit-agents python-dotenv 'livekit-plugins-aws[realtime]'

To run the LiveKit server locally, open a new terminal (for example, a new UNIX process) and run the following command:

livekit-server --dev

You must keep the LiveKit server running for the entire duration that the Amazon Nova Sonic agent is running, because it’s responsible for proxying data between parties.

Generate an access token using the following code. The default values for api-key and api-secret are devkey and secret, respectively. When creating an access token for permission to join a LiveKit room, you must specify the room name and user identity.

lk token create \
 --api-key devkey --api-secret secret \
 --join --room my-first-room --identity user1 \
 --valid-for 24h

Create environment variables. You must specify the AWS credentials:

vim .env

// contents of the .env file
AWS_ACCESS_KEY_ID=<aws access key id>
AWS_SECRET_ACCESS_KEY=<aws secret access key>

# if using a permanent identity (e.g. IAM user)
# then session token is optional
AWS_SESSION_TOKEN=<aws session token>
LIVEKIT_API_KEY=devkey
LIVEKIT_API_SECRET=secret

Create the main.py file:

from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, AutoSubscribe
from livekit.plugins.aws.experimental.realtime import RealtimeModel

load_dotenv()

async def entrypoint(ctx: agents.JobContext):
    # Connect to the LiveKit server
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    
    # Initialize the Amazon Nova Sonic agent
    agent = Agent(instructions="You are a helpful voice AI assistant.")
    session = AgentSession(llm=RealtimeModel())
    
    # Start the session in the specified room
    await session.start(
        room=ctx.room,
        agent=agent,
    )

if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Run the main.py file:

uv run python main.py connect --room my-first-room

Now you’re ready to connect to the agent frontend.

Go to https://agents-playground.livekit.io/.
Choose Manual.
In the first text field, enter ws://localhost:7880.
In the second text field, enter the access token you generated.
Choose Connect.

You should now be able to talk to Amazon Nova Sonic in real time.

If you’re disconnected from the LiveKit room, you will have to restart the agent process (main.py) to talk to Amazon Nova Sonic again.

Clean up

This example runs locally, meaning there are no special teardown steps required for cleanup. You can simply exit the agent and LiveKit server processes. The only cost incurred are the costs of making calls to Amazon Bedrock to talk to Amazon Nova Sonic. After you have disconnected from the LiveKit room, you will no longer incur charges and no AWS resources will remain in use.

Conclusion

Thanks to generative AI, the qualitative benefits long promised by voice-first applications can now be realized. By combining Amazon Nova Sonic with LiveKit’s Agents framework, developers can build real-time, voice-first AI applications with less complexity and faster deployments. The integration reduces the need for custom audio pipelines, so teams can focus on building engaging conversational experiences.

“Our goal with this integration is to simplify the development of voice AI applications,” said Russ d’Sa, CEO and co-founder of LiveKit. “By combining LiveKit’s Agents framework with Nova Sonic’s speech capabilities, we’re helping developers move faster — no need to manage low-level infrastructure, so they can focus on building their applications.”

To learn more about Amazon Nova Sonic, read the AWS News Blog, Amazon Nova Sonic product page, and Amazon Nova Sonic User Guide. To get started with Amazon Nova Sonic in Amazon Bedrock, visit the Amazon Bedrock console.

About the authors

Glen Ko is an AI developer at AWS Bedrock, where his focus is on enabling the proliferation of open source AI tooling and supporting open source innovation.

Anuj Jauhari is a Senior Product Marketing Manager at Amazon Web Services, where he helps customers realize value from innovations in generative AI.

Osman Ipek is a Solutions Architect on Amazon’s AGI team focusing on Nova foundation models. He guides teams to accelerate development through practical AI implementation strategies, with expertise spanning voice AI, NLP, and MLOps.

Artificial Intelligence

Build real-time conversational AI experiences using Amazon Nova Sonic and LiveKit

Solution overview

Prerequisites

Deploy the solution

Clean up

Conclusion

About the authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help