Noise Suppression
Removes background noise from audio — file-based or real-time over a microphone stream. Powered by DeepFilterNet3, a state-of-the-art deep learning noise suppression model.
Live at voiceai.trouve.works/noise/.
Two operating modes
File-based denoising
- Upload any audio file (WAV, MP3, OGG, FLAC, M4A, AAC, WebM, Opus, AIFF, WMA — 11+ formats)
- Receive cleaned audio in your choice of output format (WAV, MP3, OGG, FLAC)
- Upload limit: 50 MB per file
- Preserves original sample rate by default
Real-time streaming denoising
- Microphone connects through the browser
- Audio streams to the server via WebSocket, gets denoised by DeepFilterNet3, and plays back clean audio in real time
- Latency: ~150 ms (70 ms model inference + 80 ms jitter buffer)
- Native 48 kHz processing — matches standard browser audio sample rates
Key capabilities
| Feature | Details |
|---|---|
| Model | DeepFilterNet3 — GPU-accelerated, stateless processing |
| Input formats | 11+ formats (anything ffmpeg supports) |
| Output formats | WAV, MP3, OGG, FLAC |
| Real-time latency | ~150 ms end-to-end |
| Native sample rate | 48 kHz |
| GPU acceleration | NVIDIA CUDA support |
| Stateless | No per-connection state; single model serves unlimited concurrent users |
Project structure
NoiseSuppression/
├── backend/ # FastAPI server (port 8060)
│ ├── main.py # App setup, CORS, lifespan, routers
│ ├── requirements.txt
│ ├── .env.example # Config template
│ ├── routers/
│ │ ├── file_processing.py # POST /api/denoise/file
│ │ └── realtime.py # WebSocket /ws/denoise
│ └── services/
│ ├── deepfilter_engine.py # DeepFilterNet3 GPU inference wrapper
│ └── audio_converter.py # Format conversion (pydub + soundfile)
├── frontend/ # Next.js 15 UI (port 8061)
│ ├── app/page.tsx # Root page
│ ├── hooks/
│ │ ├── useFileDenoiser.ts # File upload with XHR progress
│ │ ├── useRealtimeDenoiser.ts # WebSocket + AudioWorklet
│ │ └── useAudioRecorder.ts # Microphone recording
│ ├── components/
│ │ ├── NoiseSuppressor.tsx # Root container
│ │ ├── file-mode/ # DropZone, RecorderPanel, WaveformViewer
│ │ └── realtime-mode/ # MicButton, LevelMeter
│ └── public/worklets/
│ └── pcm-processor.js # AudioWorklet for playback buffering
└── realtime/ # Standalone real-time-only UI (optional)
Backend architecture
Lifespan: DeepFilterNet3 model is loaded once at startup and shared across all requests.
DeepFilterEngine (services/deepfilter_engine.py)
- Wraps
deepfilternetlibrary (df.init_df(),df.enhance()) - Model weights on GPU (~50 MB), audio tensors on CPU
- Native sample rate: 48,000 Hz
- Stateless: each chunk processed independently
- Configurable GPU via
GPU_DEVICEenvironment variable
AudioConverter (services/audio_converter.py)
- Fast path:
soundfilefor WAV, FLAC, OGG, AIFF - Fallback:
pydub+ffmpegfor any other format (MP3, M4A, AAC, WebM, etc.) - Handles resampling (
librosa/scipy) to/from 48 kHz - Exports to WAV, MP3, OGG, FLAC
API reference
Base URL: https://voiceai.trouve.works/noise/api
File-based denoising
POST /api/denoise/file
Content-Type: multipart/form-data
Parameters:
file: <audio file> (required, max 50MB)
output_format: "wav" (wav | mp3 | ogg | flac, default: wav)
restore_sample_rate: true (bool, default: true — restore original SR)
Response: binary audio data with Content-Disposition header.
Processing pipeline:
- Decode audio (any format) to float32 mono
- Resample to 48 kHz if needed
- Process through DeepFilterNet3 (GPU)
- Resample back to original sample rate (if
restore_sample_rate=true) - Encode to requested output format
Supported formats
GET /api/denoise/formats
{
"input": ["wav", "mp3", "ogg", "flac", "m4a", "aac", "webm", "aiff", "opus", "wma"],
"output": ["wav", "mp3", "ogg", "flac"]
}
Real-time WebSocket
WebSocket: wss://voiceai.trouve.works/noise/ws/denoise
1. HANDSHAKE (client sends text frame):
{"sample_rate": 48000}
Server responds:
{"status": "ready", "sample_rate": 48000}
2. STREAMING (binary frames, bidirectional):
Client sends: Float32 LE PCM chunks (any size)
Server returns: Denoised Float32 LE PCM at same sample rate
If client sample rate ≠ 48 kHz, the server transparently resamples in both directions.
Health check
GET /health
{
"status": "ok",
"deepfilter_device": "cuda:2"
}
Frontend audio architecture
File mode pipeline
File / Recording → AudioContext.decodeAudioData() → XHR POST /api/denoise/file
│
Canvas waveform ← AudioContext.decodeAudioData() ← Response blob
Real-time pipeline
Microphone (getUserMedia)
→ MediaStreamAudioSourceNode
→ AnalyserNode [frequency bars for level meter]
→ ScriptProcessorNode (4096 samples, 48 kHz)
→ WebSocket binary frames (Float32 LE PCM)
→ Server (DeepFilterNet3 GPU inference)
→ WebSocket response frames
→ AudioWorklet (pcm-processor.js, ring buffer, 7200-sample jitter buffer)
→ AudioDestinationNode (speakers)
PCM processor worklet:
- Ring buffer accumulates WebSocket chunks
- Jitter buffer: 7200 samples (~150 ms at 48 kHz) before playback begins
- Outputs 128-sample frames on demand
- Zero-fills on underrun
- Reports buffer level via message port
Configuration
Backend (.env)
| Variable | Default | Purpose |
|---|---|---|
HOST | 0.0.0.0 | Bind address |
PORT | 8060 | Listen port |
GPU_DEVICE | cuda:0 | GPU device (cuda:N or cpu) |
MAX_UPLOAD_SIZE_MB | 50 | Max file upload size |
CORS_ORIGINS | http://localhost:8061 | Allowed origins |
Frontend (.env.local)
| Variable | Purpose |
|---|---|
NEXT_PUBLIC_API_URL | Backend API base URL |
NEXT_PUBLIC_WS_URL | WebSocket server URL |
NEXT_PUBLIC_BASE_PATH | Sub-path for production deployment |
Key dependencies
Backend
| Package | Purpose |
|---|---|
fastapi >= 0.111.0 | Web framework |
websockets >= 12.0 | WebSocket support |
deepfilternet | DeepFilterNet3 model |
torch + torchaudio | ML framework (CUDA) |
soundfile | Lossless audio I/O |
pydub | Universal audio decoder (ffmpeg) |
librosa | Audio resampling |
Frontend
| Package | Purpose |
|---|---|
next 15.x | React framework |
react 19.x | UI library |
tailwindcss 4.x | Styling |
:::caution System requirement
ffmpeg must be installed on the host for MP3, M4A, AAC, and other compressed-format support.
:::
Use cases
- Standalone audio cleanup for podcasts, recordings, and call center audio
- Preprocessing before transcription or voice biometrics — improves accuracy
- In-line with the Voice Agent pipeline
Roadmap
- Voice Activity Detection (VAD) integration
- Gain / volume normalization
- Echo cancellation