Skip to main content

Introduction

Voice SDK is a containerized, end-to-end voice platform from Trouve Labs. Build production-grade audio and speech systems — transcription, synthesis, noise suppression, speaker biometrics, and conversational AI — from a single self-hosted stack.

Built and maintained by the AVML (Audio, Voice & Machine Learning) team. Live at voiceai.trouve.works.

What Voice SDK is

Instead of stitching together fragmented APIs from OpenAI, Google Cloud, Azure, and others, Voice SDK delivers the full voice pipeline as a single, modular SDK. Every module ships as a Docker container that runs anywhere — local, cloud, or edge — and exposes both a standalone REST/WebSocket API and a unified web interface.

Core pipeline

Audio Input → Processing → Understanding → Response → Monitoring

Five stages, four modules, one SDK. Use the modules independently, or compose them into a complete agent loop.

Why teams adopt it

AdvantageWhat it means in practice
One SDKReplaces multiple fragmented vendor APIs with a unified self-hosted stack
Container-firstEvery component is a Docker image. Same artifact runs on a workstation, a cloud node, or an edge device
Self-hostedNo data leaves your infrastructure. Every model — STT, TTS, LLM, embedding, diarization — runs locally
GPU-acceleratedNVIDIA CUDA 12.x throughout, with multi-GPU allocation across services
StreamingReal-time WebSocket / WebRTC paths for every interactive module
OpenAI-compatibleDrop-in replacement for the OpenAI audio transcription and speech endpoints

Modules

ModulePurposePath
Voice AgentEnd-to-end conversational AI over WebRTC/
Voice UtilitiesTranscription and speech synthesis studio/utilities/
Noise SuppressionDeepFilterNet3 audio cleanup, file and real-time/noise/
Voice BiometricsSpeaker identification and diarization/biometric/

Where to next