A containerized, end-to-end voice platform for building intelligent audio systems anywhere — transcription, synthesis, denoising, biometrics, and conversational AI in one self-hosted stack.
Every voice workload follows the same shape. Voice SDK provides production-grade primitives at each stage — and lets you wire only the ones you need.
Each module runs as a standalone service with its own REST or WebSocket API, or composes into the unified VoiceSDK platform with a single web interface.
End-to-end conversational voice AI over WebRTC. STT → LLM → TTS, with VAD, preemptive generation, and tool calling.
Read docsModule 02Voice UtilitiesSelf-hosted transcription and speech synthesis. Whisper Large V3, Kokoro, XTTS v2 with voice cloning. 90+ languages.
Read docsModule 03Noise SuppressionDeepFilterNet3 audio cleanup. File mode for podcasts and recordings; real-time WebSocket mode at ~150 ms latency.
Read docsModule 04Voice BiometricsSpeaker identification and diarization. ECAPA-TDNN embeddings, pyannote 3.1 segmentation, async job pipeline.
Read docsEvery module ships as a Docker container. Deploy on local, cloud, or edge with zero environment friction.
No data leaves your infrastructure. All models — Whisper, Kokoro, XTTS, Qwen, ECAPA, pyannote — run on your hardware.
NVIDIA CUDA 12.x throughout. Sub-second voice agent loop, 14.2 ms file denoising, 150 ms real-time streaming.
Drop-in replacement for OpenAI audio transcription and speech endpoints. Migrate without rewriting client code.
Point your existing OpenAI client at voiceai.trouve.works and transcribe, synthesize, denoise, or identify speakers — without rewriting a line of integration code.
from openai import OpenAI
client = OpenAI(
base_url="https://voiceai.trouve.works/services/v1",
api_key="not-needed",
)
# Speech-to-text — OpenAI-compatible, self-hosted
transcript = client.audio.transcriptions.create(
model="stt-1",
file=open("call.wav", "rb"),
)
print(transcript.text)
$ pip install openai
$ python quickstart.py