Deployment
Docker Compose, GPU allocation, and reverse-proxy configuration for production.
Docker Compose services
Voice Agent
# docker-compose.yml
services:
livekit:
image: livekit/livekit-server:latest
ports: ["7880:7880", "7881:7881"]
command: --dev --bind "0.0.0.0"
# Optional (currently commented):
# kokoro-fastapi-gpu -- TTS server
# vllm-server -- LLM inference (multi-GPU)
# ollama -- Local LLM
# whisper -- STT server
# agent -- Python agent
# frontend -- Next.js app
Speaches (STT/TTS)
# compose.yaml (CPU)
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest
ports: ["8051:8000"]
volumes:
- ./model_aliases.json:/home/ubuntu/speaches/model_aliases.json
healthcheck:
test: ["CMD", "curl", "-f", "http://0.0.0.0:8051/health"]
interval: 30s
# compose.cuda.yaml (GPU)
# Extends base with nvidia/cuda:12.6.3 + GPU device allocation
Voice Biometrics
# docker-compose.yml
services:
postgres:
image: postgres:16-alpine
ports: ["8065:5432"]
volumes: [pgdata:/var/lib/postgresql/data]
healthcheck:
test: ["CMD-SHELL", "pg_isready"]
api:
build: .
depends_on:
postgres: { condition: service_healthy }
ports: ["8066:8000"]
volumes: ["./storage:/storage"]
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
device_ids: ["4"]
GPU allocation
Default mapping for a 4-GPU production node:
| Service | Default GPU | Purpose |
|---|---|---|
| Noise Suppression | cuda:2 | DeepFilterNet3 inference |
| Speaches (STT/TTS) | GPU 3 | Whisper + Kokoro |
| Voice Biometrics | GPU 4 (mapped to cuda:0) | ECAPA + pyannote |
| vLLM (LLM) | GPU 0–1 | Qwen2.5-7B inference |
| STT Test | cuda:4 | HuggingFace ASR models |
Override per-service via the relevant environment variable (GPU_DEVICE for Noise Suppression, DEVICE for Voice Biometrics, compose device_ids elsewhere).
Running locally
Noise Suppression
# Backend
cd NoiseSuppression/backend
cp .env.example .env # edit GPU_DEVICE
pip install -r requirements.txt
python main.py # runs on :8060
# Frontend
cd NoiseSuppression/frontend
npm install && npm run dev # runs on :8061
Speaches
cd speaches
docker compose -f compose.yaml up -d # CPU
docker compose -f compose.cuda.yaml up -d # GPU
Voice Biometrics
cd VoiceBiometrics
docker compose up -d # PostgreSQL + API
Voice Agent
# Backend agent
cd agent/agent_scratch
pip install -r requirements.txt
python agent.py dev # connects to LiveKit
# Frontend
cd agent/agent_scratch/conversationalai
npm install && npm run dev
STT Test Adapter
cd stt_test
pip install -r requirements.txt
python server.py --model openai/whisper-large-v3 --port 8000
System requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| NVIDIA GPU | 1× (8 GB VRAM) | 4×+ (24 GB+ each) |
| CUDA | 12.1 | 12.6 |
| RAM | 16 GB | 64 GB+ |
| Storage | 50 GB | 200 GB+ (model cache) |
| OS | Ubuntu 22.04 | Ubuntu 24.04 |
| Docker | 24.x | latest |
ffmpeg | required | latest |
| Python | 3.10 | 3.13 |
| Node.js | 18.x | 22.x |
Protocols & formats
| Concern | Implementation |
|---|---|
| API style | REST (FastAPI) + WebSocket |
| Real-time audio | WebSocket with binary Float32 LE PCM |
| Voice rooms | LiveKit (WebRTC) |
| API compatibility | OpenAI API format (STT/TTS endpoints) |
| Database | PostgreSQL 16 (Voice Biometrics) |
| Containerization | Docker + Docker Compose |
| GPU compute | NVIDIA CUDA 12.x |