Voice Agent
A fully self-hosted, end-to-end conversational voice AI system. Users speak naturally; the agent listens, understands, thinks, and responds with voice in real time.
Live at voiceai.trouve.works.
How it works
The Voice Agent orchestrates three core AI components into a seamless conversation loop, plus voice activity detection at the edge:
User speaks
│
▼
[Silero VAD] — Voice activity detection
│
▼
[Whisper STT] — OpenAI-compatible endpoint
│ base_url: https://voiceai.trouve.works/services/v1/
▼
[Qwen2.5-7B LLM] — self-hosted vLLM
│ base_url: http://localhost:8090/v1
│ ├── [Function tools] — optional tool execution
▼
[Kokoro TTS] — OpenAI-compatible endpoint
│ base_url: https://voiceai.trouve.works/services/v1/
▼
User hears response
All of this happens in real time over WebRTC, powered by LiveKit Agents SDK v1.4.6.
Key capabilities
| Capability | Detail |
|---|---|
| Real-time voice conversations | Sub-second latency |
| Voice Activity Detection | Silero — detects when user starts and stops speaking |
| Preemptive generation | LLM begins formulating a response while the user is still speaking |
| Multilingual turn detection | Natural turn-taking across languages |
| Tool / function calling | Agent executes actions (weather, DB queries, API calls) mid-conversation |
| RAG support | Extensible with retrieval-augmented generation |
| Multi-platform frontends | React web app, plus iOS/macOS, Flutter, React Native, Android, web embed, telephony |
| Customizable UI | Five audio visualizer styles (bar, wave, grid, radial, aura), theming, branding |
Project structure
agent/agent_scratch/
├── agent.py # Python backend agent entry point
├── requirements.txt # Python dependencies
├── .env / .env.local # Environment variables
├── docker-compose.yml # Docker orchestration
├── livekit/
│ └── Dockerfile # LiveKit server build
└── conversationalai/ # Next.js frontend
├── app/
│ ├── page.tsx # Root page
│ ├── layout.tsx # App layout
│ └── api/token/route.ts # LiveKit token generation
├── components/
│ ├── agents-ui/ # Audio visualizers, controls, transcript
│ ├── ai-elements/ # Conversation UI elements
│ ├── app/ # App-level components
│ └── ui/ # Primitive shadcn/ui components
├── hooks/ # useAgentErrors, useDebug, useAgentControlBar
├── lib/utils.ts # Config, token source utilities
└── app-config.ts # UI configuration interface
Agent backend
The agent is built on LiveKit Agents SDK v1.4.6. The entry point:
from livekit.agents import Agent, AgentSession, cli, WorkerOptions
from livekit.plugins import openai, silero
async def entrypoint(ctx):
await ctx.connect()
agent = Agent(
instructions="System prompt here...",
tools=[lookup_weather], # Function tools
)
session = AgentSession(
vad=silero.VAD.load(),
stt=openai.STT(model="stt-1", base_url="..."),
llm=openai.LLM(model="Qwen/Qwen2.5-7B-Instruct", base_url="..."),
tts=openai.TTS(model="tts-1", voice="af_heart", base_url="..."),
)
await session.start(agent=agent, room=ctx.room)
await session.generate_reply(instructions="greet the user")
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Function calling / tools
Tools are defined with the @function_tool decorator:
from livekit.agents.llm.tool_context import function_tool
@function_tool
async def lookup_weather(context: RunContext, location: str):
"""Used to look up weather information."""
return {"weather": "sunny", "temperature": 70}
- Type annotations are extracted for LLM schema generation
- Docstrings become tool descriptions
- Tools are passed to the
Agent(tools=[...])constructor - The framework handles tool call orchestration and result feeding
Frontend token generation
POST /api/token generates LiveKit access tokens:
// app/api/token/route.ts
const at = new AccessToken(apiKey, apiSecret, {
identity: participantName,
});
at.addGrant({
room: roomName,
roomJoin: true,
canPublish: true,
canSubscribe: true,
});
Configuration
Environment variables
| Variable | Purpose | Example |
|---|---|---|
LIVEKIT_API_KEY | Room creation auth | devkey |
LIVEKIT_API_SECRET | Token signing | secret |
LIVEKIT_URL | LiveKit server URL | wss://voiceai.trouve.works/livekit |
AGENT_NAME | Explicit agent dispatch (optional) | – |
Frontend config (app-config.ts)
| Option | Type | Purpose |
|---|---|---|
supportsChatInput | boolean | Enable text chat |
supportsVideoInput | boolean | Enable camera |
supportsScreenShare | boolean | Enable screen sharing |
audioVisualizerType | bar | wave | grid | radial | aura | Visualizer style |
companyName | string | Branding |
accent / accentDark | hex color | Theme colors |
Key dependencies
| Package | Version | Purpose |
|---|---|---|
livekit-agents | 1.4.6 | Core agent framework |
livekit-plugins-openai | 1.4.6 | OpenAI-compatible STT/LLM/TTS |
livekit-plugins-silero | 1.4.6 | Silero VAD |
@livekit/components-react | ^2.9.20 | React UI hooks |
livekit-client | ^2.17.2 | WebRTC client |
next | 15.5.9 | React framework |
Use cases
- Customer support voice bots
- Multilingual voice assistants for apps
- Voice-enabled internal tools
- Interactive voice response (IVR) replacement
- Accessibility interfaces