Use cases
Common compositions of Voice SDK modules. Each row maps a real workload to the modules that make it possible.
| Use case | Modules |
|---|---|
| Call center AI | Voice Agent + Noise Suppression + Voice Biometrics |
| Meeting transcription | STT + Diarization + Language detection |
| Voice authentication | Voice Biometrics + Speaker ID |
| Podcast / audio cleanup | Noise Suppression |
| Multilingual voice assistant | Voice Agent (full pipeline) |
| Accessibility tools | STT + TTS |
| Speaker analytics | Voice Biometrics + Diarization |
Patterns worth highlighting
Denoise before transcribe
For call-center and field-recorded audio, route through Noise Suppression first — DeepFilterNet3 cleanup measurably improves Whisper accuracy on noisy inputs without changing the rest of the pipeline.
Diarize, then identify
For meeting recordings, run /diarize to get the speaker count and segment timing, then enroll the participants you care about and run /identify against the same recording. This separates who is talking from who that person is — useful when the participant list is partially known.
One agent, multiple frontends
A single Voice Agent backend serves React web, iOS/macOS, Flutter, React Native, Android, web embeds, and telephony integrations through LiveKit. Build the conversation logic once; ship it everywhere.
OpenAI client, self-hosted endpoints
Because Speaches exposes OpenAI-compatible STT and TTS endpoints, existing applications can migrate from OpenAI to self-hosted by changing only base_url. No client rewrite required.