Skip to main content

Voice Agent

A fully self-hosted, end-to-end conversational voice AI system. Users speak naturally; the agent listens, understands, thinks, and responds with voice in real time.

Live at voiceai.trouve.works.

How it works

The Voice Agent orchestrates three core AI components into a seamless conversation loop, plus voice activity detection at the edge:

User speaks


[Silero VAD] — Voice activity detection


[Whisper STT] — OpenAI-compatible endpoint
│ base_url: https://voiceai.trouve.works/services/v1/

[Qwen2.5-7B LLM] — self-hosted vLLM
│ base_url: http://localhost:8090/v1
│ ├── [Function tools] — optional tool execution

[Kokoro TTS] — OpenAI-compatible endpoint
│ base_url: https://voiceai.trouve.works/services/v1/

User hears response

All of this happens in real time over WebRTC, powered by LiveKit Agents SDK v1.4.6.

Key capabilities

CapabilityDetail
Real-time voice conversationsSub-second latency
Voice Activity DetectionSilero — detects when user starts and stops speaking
Preemptive generationLLM begins formulating a response while the user is still speaking
Multilingual turn detectionNatural turn-taking across languages
Tool / function callingAgent executes actions (weather, DB queries, API calls) mid-conversation
RAG supportExtensible with retrieval-augmented generation
Multi-platform frontendsReact web app, plus iOS/macOS, Flutter, React Native, Android, web embed, telephony
Customizable UIFive audio visualizer styles (bar, wave, grid, radial, aura), theming, branding

Project structure

agent/agent_scratch/
├── agent.py # Python backend agent entry point
├── requirements.txt # Python dependencies
├── .env / .env.local # Environment variables
├── docker-compose.yml # Docker orchestration
├── livekit/
│ └── Dockerfile # LiveKit server build
└── conversationalai/ # Next.js frontend
├── app/
│ ├── page.tsx # Root page
│ ├── layout.tsx # App layout
│ └── api/token/route.ts # LiveKit token generation
├── components/
│ ├── agents-ui/ # Audio visualizers, controls, transcript
│ ├── ai-elements/ # Conversation UI elements
│ ├── app/ # App-level components
│ └── ui/ # Primitive shadcn/ui components
├── hooks/ # useAgentErrors, useDebug, useAgentControlBar
├── lib/utils.ts # Config, token source utilities
└── app-config.ts # UI configuration interface

Agent backend

The agent is built on LiveKit Agents SDK v1.4.6. The entry point:

from livekit.agents import Agent, AgentSession, cli, WorkerOptions
from livekit.plugins import openai, silero

async def entrypoint(ctx):
await ctx.connect()

agent = Agent(
instructions="System prompt here...",
tools=[lookup_weather], # Function tools
)

session = AgentSession(
vad=silero.VAD.load(),
stt=openai.STT(model="stt-1", base_url="..."),
llm=openai.LLM(model="Qwen/Qwen2.5-7B-Instruct", base_url="..."),
tts=openai.TTS(model="tts-1", voice="af_heart", base_url="..."),
)

await session.start(agent=agent, room=ctx.room)
await session.generate_reply(instructions="greet the user")

cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function calling / tools

Tools are defined with the @function_tool decorator:

from livekit.agents.llm.tool_context import function_tool

@function_tool
async def lookup_weather(context: RunContext, location: str):
"""Used to look up weather information."""
return {"weather": "sunny", "temperature": 70}
  • Type annotations are extracted for LLM schema generation
  • Docstrings become tool descriptions
  • Tools are passed to the Agent(tools=[...]) constructor
  • The framework handles tool call orchestration and result feeding

Frontend token generation

POST /api/token generates LiveKit access tokens:

// app/api/token/route.ts
const at = new AccessToken(apiKey, apiSecret, {
identity: participantName,
});
at.addGrant({
room: roomName,
roomJoin: true,
canPublish: true,
canSubscribe: true,
});

Configuration

Environment variables

VariablePurposeExample
LIVEKIT_API_KEYRoom creation authdevkey
LIVEKIT_API_SECRETToken signingsecret
LIVEKIT_URLLiveKit server URLwss://voiceai.trouve.works/livekit
AGENT_NAMEExplicit agent dispatch (optional)

Frontend config (app-config.ts)

OptionTypePurpose
supportsChatInputbooleanEnable text chat
supportsVideoInputbooleanEnable camera
supportsScreenSharebooleanEnable screen sharing
audioVisualizerTypebar | wave | grid | radial | auraVisualizer style
companyNamestringBranding
accent / accentDarkhex colorTheme colors

Key dependencies

PackageVersionPurpose
livekit-agents1.4.6Core agent framework
livekit-plugins-openai1.4.6OpenAI-compatible STT/LLM/TTS
livekit-plugins-silero1.4.6Silero VAD
@livekit/components-react^2.9.20React UI hooks
livekit-client^2.17.2WebRTC client
next15.5.9React framework

Use cases

  • Customer support voice bots
  • Multilingual voice assistants for apps
  • Voice-enabled internal tools
  • Interactive voice response (IVR) replacement
  • Accessibility interfaces