Deployment

Docker Compose, GPU allocation, and reverse-proxy configuration for production.

Docker Compose services

Voice Agent

# docker-compose.yml
services:
  livekit:
    image: livekit/livekit-server:latest
    ports: ["7880:7880", "7881:7881"]
    command: --dev --bind "0.0.0.0"

  # Optional (currently commented):
  # kokoro-fastapi-gpu  -- TTS server
  # vllm-server         -- LLM inference (multi-GPU)
  # ollama              -- Local LLM
  # whisper             -- STT server
  # agent               -- Python agent
  # frontend            -- Next.js app

Speaches (STT/TTS)

# compose.yaml (CPU)
services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:latest
    ports: ["8051:8000"]
    volumes:
      - ./model_aliases.json:/home/ubuntu/speaches/model_aliases.json
    healthcheck:
      test: ["CMD", "curl", "-f", "http://0.0.0.0:8051/health"]
      interval: 30s

# compose.cuda.yaml (GPU)
# Extends base with nvidia/cuda:12.6.3 + GPU device allocation

Voice Biometrics

# docker-compose.yml
services:
  postgres:
    image: postgres:16-alpine
    ports: ["8065:5432"]
    volumes: [pgdata:/var/lib/postgresql/data]
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]

  api:
    build: .
    depends_on:
      postgres: { condition: service_healthy }
    ports: ["8066:8000"]
    volumes: ["./storage:/storage"]
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
              device_ids: ["4"]

GPU allocation

Default mapping for a 4-GPU production node:

Service	Default GPU	Purpose
Noise Suppression	`cuda:2`	DeepFilterNet3 inference
Speaches (STT/TTS)	GPU 3	Whisper + Kokoro
Voice Biometrics	GPU 4 (mapped to `cuda:0`)	ECAPA + pyannote
vLLM (LLM)	GPU 0–1	Qwen2.5-7B inference
STT Test	`cuda:4`	HuggingFace ASR models

Override per-service via the relevant environment variable (GPU_DEVICE for Noise Suppression, DEVICE for Voice Biometrics, compose device_ids elsewhere).

Running locally

Noise Suppression

# Backend
cd NoiseSuppression/backend
cp .env.example .env          # edit GPU_DEVICE
pip install -r requirements.txt
python main.py                # runs on :8060

# Frontend
cd NoiseSuppression/frontend
npm install && npm run dev    # runs on :8061

Speaches

cd speaches
docker compose -f compose.yaml up -d        # CPU
docker compose -f compose.cuda.yaml up -d   # GPU

Voice Biometrics

cd VoiceBiometrics
docker compose up -d          # PostgreSQL + API

Voice Agent

# Backend agent
cd agent/agent_scratch
pip install -r requirements.txt
python agent.py dev           # connects to LiveKit

# Frontend
cd agent/agent_scratch/conversationalai
npm install && npm run dev

STT Test Adapter

cd stt_test
pip install -r requirements.txt
python server.py --model openai/whisper-large-v3 --port 8000

System requirements

Requirement	Minimum	Recommended
NVIDIA GPU	1× (8 GB VRAM)	4×+ (24 GB+ each)
CUDA	12.1	12.6
RAM	16 GB	64 GB+
Storage	50 GB	200 GB+ (model cache)
OS	Ubuntu 22.04	Ubuntu 24.04
Docker	24.x	latest
`ffmpeg`	required	latest
Python	3.10	3.13
Node.js	18.x	22.x

Protocols & formats

Concern	Implementation
API style	REST (FastAPI) + WebSocket
Real-time audio	WebSocket with binary Float32 LE PCM
Voice rooms	LiveKit (WebRTC)
API compatibility	OpenAI API format (STT/TTS endpoints)
Database	PostgreSQL 16 (Voice Biometrics)
Containerization	Docker + Docker Compose
GPU compute	NVIDIA CUDA 12.x

Docker Compose services​

Voice Agent​

Speaches (STT/TTS)​

Voice Biometrics​

GPU allocation​

Running locally​

Noise Suppression​

Speaches​

Voice Biometrics​

Voice Agent​

STT Test Adapter​

System requirements​

Protocols & formats​

Docker Compose services

Voice Agent

Speaches (STT/TTS)

Voice Biometrics

GPU allocation

Running locally

Noise Suppression

Speaches

Voice Biometrics

Voice Agent

STT Test Adapter

System requirements

Protocols & formats