Voice Agent

A fully self-hosted, end-to-end conversational voice AI system. Users speak naturally; the agent listens, understands, thinks, and responds with voice in real time.

Live at voiceai.trouve.works.

How it works

The Voice Agent orchestrates three core AI components into a seamless conversation loop, plus voice activity detection at the edge:

User speaks
    │
    ▼
[Silero VAD]  — Voice activity detection
    │
    ▼
[Whisper STT] — OpenAI-compatible endpoint
    │   base_url: https://voiceai.trouve.works/services/v1/
    ▼
[Qwen2.5-7B LLM] — self-hosted vLLM
    │   base_url: http://localhost:8090/v1
    │   ├── [Function tools] — optional tool execution
    ▼
[Kokoro TTS] — OpenAI-compatible endpoint
    │   base_url: https://voiceai.trouve.works/services/v1/
    ▼
User hears response

All of this happens in real time over WebRTC, powered by LiveKit Agents SDK v1.4.6.

Key capabilities

Capability	Detail
Real-time voice conversations	Sub-second latency
Voice Activity Detection	Silero — detects when user starts and stops speaking
Preemptive generation	LLM begins formulating a response while the user is still speaking
Multilingual turn detection	Natural turn-taking across languages
Tool / function calling	Agent executes actions (weather, DB queries, API calls) mid-conversation
RAG support	Extensible with retrieval-augmented generation
Multi-platform frontends	React web app, plus iOS/macOS, Flutter, React Native, Android, web embed, telephony
Customizable UI	Five audio visualizer styles (bar, wave, grid, radial, aura), theming, branding

Project structure

agent/agent_scratch/
├── agent.py                       # Python backend agent entry point
├── requirements.txt               # Python dependencies
├── .env / .env.local              # Environment variables
├── docker-compose.yml             # Docker orchestration
├── livekit/
│   └── Dockerfile                 # LiveKit server build
└── conversationalai/              # Next.js frontend
    ├── app/
    │   ├── page.tsx               # Root page
    │   ├── layout.tsx             # App layout
    │   └── api/token/route.ts     # LiveKit token generation
    ├── components/
    │   ├── agents-ui/             # Audio visualizers, controls, transcript
    │   ├── ai-elements/           # Conversation UI elements
    │   ├── app/                   # App-level components
    │   └── ui/                    # Primitive shadcn/ui components
    ├── hooks/                     # useAgentErrors, useDebug, useAgentControlBar
    ├── lib/utils.ts               # Config, token source utilities
    └── app-config.ts              # UI configuration interface

Agent backend

The agent is built on LiveKit Agents SDK v1.4.6. The entry point:

from livekit.agents import Agent, AgentSession, cli, WorkerOptions
from livekit.plugins import openai, silero

async def entrypoint(ctx):
    await ctx.connect()

    agent = Agent(
        instructions="System prompt here...",
        tools=[lookup_weather],   # Function tools
    )

    session = AgentSession(
        vad=silero.VAD.load(),
        stt=openai.STT(model="stt-1", base_url="..."),
        llm=openai.LLM(model="Qwen/Qwen2.5-7B-Instruct", base_url="..."),
        tts=openai.TTS(model="tts-1", voice="af_heart", base_url="..."),
    )

    await session.start(agent=agent, room=ctx.room)
    await session.generate_reply(instructions="greet the user")

cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Function calling / tools

Tools are defined with the @function_tool decorator:

from livekit.agents.llm.tool_context import function_tool

@function_tool
async def lookup_weather(context: RunContext, location: str):
    """Used to look up weather information."""
    return {"weather": "sunny", "temperature": 70}

Type annotations are extracted for LLM schema generation
Docstrings become tool descriptions
Tools are passed to the Agent(tools=[...]) constructor
The framework handles tool call orchestration and result feeding

Frontend token generation

POST /api/token generates LiveKit access tokens:

// app/api/token/route.ts
const at = new AccessToken(apiKey, apiSecret, {
  identity: participantName,
});
at.addGrant({
  room: roomName,
  roomJoin: true,
  canPublish: true,
  canSubscribe: true,
});

Configuration

Environment variables

Variable	Purpose	Example
`LIVEKIT_API_KEY`	Room creation auth	`devkey`
`LIVEKIT_API_SECRET`	Token signing	`secret`
`LIVEKIT_URL`	LiveKit server URL	`wss://voiceai.trouve.works/livekit`
`AGENT_NAME`	Explicit agent dispatch (optional)	–

Frontend config (`app-config.ts`)

Option	Type	Purpose
`supportsChatInput`	`boolean`	Enable text chat
`supportsVideoInput`	`boolean`	Enable camera
`supportsScreenShare`	`boolean`	Enable screen sharing
`audioVisualizerType`	`bar \| wave \| grid \| radial \| aura`	Visualizer style
`companyName`	`string`	Branding
`accent` / `accentDark`	hex color	Theme colors

Key dependencies

Package	Version	Purpose
`livekit-agents`	1.4.6	Core agent framework
`livekit-plugins-openai`	1.4.6	OpenAI-compatible STT/LLM/TTS
`livekit-plugins-silero`	1.4.6	Silero VAD
`@livekit/components-react`	^2.9.20	React UI hooks
`livekit-client`	^2.17.2	WebRTC client
`next`	15.5.9	React framework

Use cases

Customer support voice bots
Multilingual voice assistants for apps
Voice-enabled internal tools
Interactive voice response (IVR) replacement
Accessibility interfaces

How it works​

Key capabilities​

Project structure​

Agent backend​

Function calling / tools​

Frontend token generation​

Configuration​

Environment variables​

Frontend config (app-config.ts)​

Key dependencies​

Use cases​