Architecture

How the four Voice SDK modules connect, route, and share infrastructure.

Service topology

                     ┌───────────────────┐
                     │   Client Apps     │
                     │  (Web, Mobile,    │
                     │   Telephony)      │
                     └─────────┬─────────┘
                               │
                     ┌─────────▼─────────┐
                     │   Reverse Proxy   │  ← Nginx: routing, TLS, path dispatch
                     └─────────┬─────────┘
                               │
        ┌──────────┬───────────┼───────────┬──────────┐
        │          │           │           │          │
   ┌────▼────┐┌────▼────┐ ┌────▼─────┐┌────▼────┐┌────▼─────┐
   │  Voice  ││Speaches │ │  Noise   ││  Voice  ││ LiveKit  │
   │  Agent  ││(STT/TTS)│ │ Suppres. ││ Biomet. ││  Server  │
   │ :front  ││  :8051  │ │  :8060   ││  :8066  ││  :7880   │
   │ :back   ││         │ │  :8061   ││         ││          │
   └─────────┘└─────────┘ └──────────┘└────┬────┘└──────────┘
                                            │
                                       ┌────▼─────┐
                                       │PostgreSQL│
                                       │  :8065   │
                                       └──────────┘

A single Nginx reverse proxy fronts the platform at voiceai.trouve.works and dispatches to the right backend service based on URL path.

URL routing

Path	Backend service	Port
`/`	Voice Agent (Next.js frontend)	–
`/livekit`	LiveKit Server (WebRTC)	7880
`/services/`	Speaches (STT/TTS backend)	8051
`/utilities/`	VoiceUtilities (static frontend)	8112
`/noise/`	Noise Suppression frontend	8061
`/noise/api/`	Noise Suppression backend	8060
`/noise/ws/`	Noise Suppression WebSocket	8060
`/biometric/`	Voice Biometrics frontend	–
`/biometric/api/`	Voice Biometrics backend	8066

Data flow

Concern	Implementation
API style	REST (FastAPI) + WebSocket
Real-time audio	WebSocket with binary Float32 LE PCM
Voice rooms	LiveKit (WebRTC)
API compatibility	OpenAI API format for STT/TTS endpoints
Database	PostgreSQL 16 (Voice Biometrics only)
Containerization	Docker + Docker Compose
GPU compute	NVIDIA CUDA 12.x

Storage layout

/storage/
  enrollments/{speaker_id}/    # Voice Biometrics — enrolled speaker audio
  jobs/{job_id}/               # Voice Biometrics — uploads + extracted segments
  models/                      # HuggingFace + Torch model cache (shared)

The /storage/models/ directory is mounted into every container that loads HuggingFace or Torch weights — first-pull caching across services prevents duplicate downloads.

Stateful vs stateless modules

Module	State
Voice Agent	Stateless. Per-room ephemeral state inside LiveKit
Voice Utilities (Speaches)	Stateless. Models cached in-process; `WHISPER__TTL=-1` keeps them resident
Noise Suppression	Stateless. DeepFilterNet3 processes each chunk independently
Voice Biometrics	Stateful. PostgreSQL stores speakers, embeddings, jobs, and segments

Only Voice Biometrics requires durable storage. The other modules can be torn down and rebuilt without data loss.

Where to next

Tech stack — frameworks, models, and versions across the platform
Deployment — Docker Compose, GPU allocation, and reverse proxy
Modules — internal architecture per module

Service topology​

URL routing​

Data flow​

Storage layout​

Stateful vs stateless modules​

Where to next​