Skip to main content

Deployment

Docker Compose, GPU allocation, and reverse-proxy configuration for production.

Docker Compose services

Voice Agent

# docker-compose.yml
services:
livekit:
image: livekit/livekit-server:latest
ports: ["7880:7880", "7881:7881"]
command: --dev --bind "0.0.0.0"

# Optional (currently commented):
# kokoro-fastapi-gpu -- TTS server
# vllm-server -- LLM inference (multi-GPU)
# ollama -- Local LLM
# whisper -- STT server
# agent -- Python agent
# frontend -- Next.js app

Speaches (STT/TTS)

# compose.yaml (CPU)
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest
ports: ["8051:8000"]
volumes:
- ./model_aliases.json:/home/ubuntu/speaches/model_aliases.json
healthcheck:
test: ["CMD", "curl", "-f", "http://0.0.0.0:8051/health"]
interval: 30s

# compose.cuda.yaml (GPU)
# Extends base with nvidia/cuda:12.6.3 + GPU device allocation

Voice Biometrics

# docker-compose.yml
services:
postgres:
image: postgres:16-alpine
ports: ["8065:5432"]
volumes: [pgdata:/var/lib/postgresql/data]
healthcheck:
test: ["CMD-SHELL", "pg_isready"]

api:
build: .
depends_on:
postgres: { condition: service_healthy }
ports: ["8066:8000"]
volumes: ["./storage:/storage"]
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
device_ids: ["4"]

GPU allocation

Default mapping for a 4-GPU production node:

ServiceDefault GPUPurpose
Noise Suppressioncuda:2DeepFilterNet3 inference
Speaches (STT/TTS)GPU 3Whisper + Kokoro
Voice BiometricsGPU 4 (mapped to cuda:0)ECAPA + pyannote
vLLM (LLM)GPU 0–1Qwen2.5-7B inference
STT Testcuda:4HuggingFace ASR models

Override per-service via the relevant environment variable (GPU_DEVICE for Noise Suppression, DEVICE for Voice Biometrics, compose device_ids elsewhere).

Running locally

Noise Suppression

# Backend
cd NoiseSuppression/backend
cp .env.example .env # edit GPU_DEVICE
pip install -r requirements.txt
python main.py # runs on :8060

# Frontend
cd NoiseSuppression/frontend
npm install && npm run dev # runs on :8061

Speaches

cd speaches
docker compose -f compose.yaml up -d # CPU
docker compose -f compose.cuda.yaml up -d # GPU

Voice Biometrics

cd VoiceBiometrics
docker compose up -d # PostgreSQL + API

Voice Agent

# Backend agent
cd agent/agent_scratch
pip install -r requirements.txt
python agent.py dev # connects to LiveKit

# Frontend
cd agent/agent_scratch/conversationalai
npm install && npm run dev

STT Test Adapter

cd stt_test
pip install -r requirements.txt
python server.py --model openai/whisper-large-v3 --port 8000

System requirements

RequirementMinimumRecommended
NVIDIA GPU1× (8 GB VRAM)4×+ (24 GB+ each)
CUDA12.112.6
RAM16 GB64 GB+
Storage50 GB200 GB+ (model cache)
OSUbuntu 22.04Ubuntu 24.04
Docker24.xlatest
ffmpegrequiredlatest
Python3.103.13
Node.js18.x22.x

Protocols & formats

ConcernImplementation
API styleREST (FastAPI) + WebSocket
Real-time audioWebSocket with binary Float32 LE PCM
Voice roomsLiveKit (WebRTC)
API compatibilityOpenAI API format (STT/TTS endpoints)
DatabasePostgreSQL 16 (Voice Biometrics)
ContainerizationDocker + Docker Compose
GPU computeNVIDIA CUDA 12.x