Prerequisites

What your host needs before running Voice SDK locally or in production.

Hardware

Requirement	Minimum	Recommended
NVIDIA GPU	1× (8 GB VRAM)	4×+ (24 GB+ each)
CUDA	12.1	12.6
RAM	16 GB	64 GB+
Storage	50 GB	200 GB+ (model cache)

The recommended footprint matches the production layout — GPUs are allocated per service so STT/TTS, denoising, biometrics, and the LLM run in parallel without contention. See Deployment for the default GPU mapping.

Operating system

Component	Minimum	Recommended
OS	Ubuntu 22.04	Ubuntu 24.04
Docker	24.x	Latest
`ffmpeg`	Required	Latest
Python	3.10	3.13
Node.js	18.x	22.x

ffmpeg is required for MP3, M4A, AAC, and other compressed-format support across the noise suppression and STT/TTS services.

Tokens & secrets

Variable	Required for	How to obtain
`HF_TOKEN`	Voice Biometrics (pyannote diarization)	HuggingFace account → access token
`LIVEKIT_API_KEY` / `LIVEKIT_API_SECRET`	Voice Agent	Generated by the LiveKit server (`devkey` / `secret` in dev)

Voice Biometrics will fall back to the SpeechBrain sliding-window diarizer when HF_TOKEN is absent — you lose pyannote's accuracy but keep functionality.

Network

HTTPS reverse proxy (Nginx is the reference) for path-based dispatch across modules
WebSocket support (/noise/ws/) and WebRTC support (/livekit) on the proxy
Outbound access to huggingface.co and pypi.org for first-time model and dependency downloads

Hardware​

Operating system​

Tokens & secrets​

Network​

Hardware

Operating system

Tokens & secrets

Network