Skip to main content

Noise Suppression

Removes background noise from audio — file-based or real-time over a microphone stream. Powered by DeepFilterNet3, a state-of-the-art deep learning noise suppression model.

Live at voiceai.trouve.works/noise/.

Two operating modes

File-based denoising

  • Upload any audio file (WAV, MP3, OGG, FLAC, M4A, AAC, WebM, Opus, AIFF, WMA — 11+ formats)
  • Receive cleaned audio in your choice of output format (WAV, MP3, OGG, FLAC)
  • Upload limit: 50 MB per file
  • Preserves original sample rate by default

Real-time streaming denoising

  • Microphone connects through the browser
  • Audio streams to the server via WebSocket, gets denoised by DeepFilterNet3, and plays back clean audio in real time
  • Latency: ~150 ms (70 ms model inference + 80 ms jitter buffer)
  • Native 48 kHz processing — matches standard browser audio sample rates

Key capabilities

FeatureDetails
ModelDeepFilterNet3 — GPU-accelerated, stateless processing
Input formats11+ formats (anything ffmpeg supports)
Output formatsWAV, MP3, OGG, FLAC
Real-time latency~150 ms end-to-end
Native sample rate48 kHz
GPU accelerationNVIDIA CUDA support
StatelessNo per-connection state; single model serves unlimited concurrent users

Project structure

NoiseSuppression/
├── backend/ # FastAPI server (port 8060)
│ ├── main.py # App setup, CORS, lifespan, routers
│ ├── requirements.txt
│ ├── .env.example # Config template
│ ├── routers/
│ │ ├── file_processing.py # POST /api/denoise/file
│ │ └── realtime.py # WebSocket /ws/denoise
│ └── services/
│ ├── deepfilter_engine.py # DeepFilterNet3 GPU inference wrapper
│ └── audio_converter.py # Format conversion (pydub + soundfile)
├── frontend/ # Next.js 15 UI (port 8061)
│ ├── app/page.tsx # Root page
│ ├── hooks/
│ │ ├── useFileDenoiser.ts # File upload with XHR progress
│ │ ├── useRealtimeDenoiser.ts # WebSocket + AudioWorklet
│ │ └── useAudioRecorder.ts # Microphone recording
│ ├── components/
│ │ ├── NoiseSuppressor.tsx # Root container
│ │ ├── file-mode/ # DropZone, RecorderPanel, WaveformViewer
│ │ └── realtime-mode/ # MicButton, LevelMeter
│ └── public/worklets/
│ └── pcm-processor.js # AudioWorklet for playback buffering
└── realtime/ # Standalone real-time-only UI (optional)

Backend architecture

Lifespan: DeepFilterNet3 model is loaded once at startup and shared across all requests.

DeepFilterEngine (services/deepfilter_engine.py)

  • Wraps deepfilternet library (df.init_df(), df.enhance())
  • Model weights on GPU (~50 MB), audio tensors on CPU
  • Native sample rate: 48,000 Hz
  • Stateless: each chunk processed independently
  • Configurable GPU via GPU_DEVICE environment variable

AudioConverter (services/audio_converter.py)

  • Fast path: soundfile for WAV, FLAC, OGG, AIFF
  • Fallback: pydub + ffmpeg for any other format (MP3, M4A, AAC, WebM, etc.)
  • Handles resampling (librosa / scipy) to/from 48 kHz
  • Exports to WAV, MP3, OGG, FLAC

API reference

Base URL: https://voiceai.trouve.works/noise/api

File-based denoising

POST /api/denoise/file
Content-Type: multipart/form-data

Parameters:
file: <audio file> (required, max 50MB)
output_format: "wav" (wav | mp3 | ogg | flac, default: wav)
restore_sample_rate: true (bool, default: true — restore original SR)

Response: binary audio data with Content-Disposition header.

Processing pipeline:

  1. Decode audio (any format) to float32 mono
  2. Resample to 48 kHz if needed
  3. Process through DeepFilterNet3 (GPU)
  4. Resample back to original sample rate (if restore_sample_rate=true)
  5. Encode to requested output format

Supported formats

GET /api/denoise/formats
{
"input": ["wav", "mp3", "ogg", "flac", "m4a", "aac", "webm", "aiff", "opus", "wma"],
"output": ["wav", "mp3", "ogg", "flac"]
}

Real-time WebSocket

WebSocket: wss://voiceai.trouve.works/noise/ws/denoise

1. HANDSHAKE (client sends text frame):
{"sample_rate": 48000}

Server responds:
{"status": "ready", "sample_rate": 48000}

2. STREAMING (binary frames, bidirectional):
Client sends: Float32 LE PCM chunks (any size)
Server returns: Denoised Float32 LE PCM at same sample rate

If client sample rate ≠ 48 kHz, the server transparently resamples in both directions.

Health check

GET /health
{
"status": "ok",
"deepfilter_device": "cuda:2"
}

Frontend audio architecture

File mode pipeline

File / Recording → AudioContext.decodeAudioData() → XHR POST /api/denoise/file

Canvas waveform ← AudioContext.decodeAudioData() ← Response blob

Real-time pipeline

Microphone (getUserMedia)
→ MediaStreamAudioSourceNode
→ AnalyserNode [frequency bars for level meter]
→ ScriptProcessorNode (4096 samples, 48 kHz)
→ WebSocket binary frames (Float32 LE PCM)
→ Server (DeepFilterNet3 GPU inference)
→ WebSocket response frames
→ AudioWorklet (pcm-processor.js, ring buffer, 7200-sample jitter buffer)
→ AudioDestinationNode (speakers)

PCM processor worklet:

  • Ring buffer accumulates WebSocket chunks
  • Jitter buffer: 7200 samples (~150 ms at 48 kHz) before playback begins
  • Outputs 128-sample frames on demand
  • Zero-fills on underrun
  • Reports buffer level via message port

Configuration

Backend (.env)

VariableDefaultPurpose
HOST0.0.0.0Bind address
PORT8060Listen port
GPU_DEVICEcuda:0GPU device (cuda:N or cpu)
MAX_UPLOAD_SIZE_MB50Max file upload size
CORS_ORIGINShttp://localhost:8061Allowed origins

Frontend (.env.local)

VariablePurpose
NEXT_PUBLIC_API_URLBackend API base URL
NEXT_PUBLIC_WS_URLWebSocket server URL
NEXT_PUBLIC_BASE_PATHSub-path for production deployment

Key dependencies

Backend

PackagePurpose
fastapi >= 0.111.0Web framework
websockets >= 12.0WebSocket support
deepfilternetDeepFilterNet3 model
torch + torchaudioML framework (CUDA)
soundfileLossless audio I/O
pydubUniversal audio decoder (ffmpeg)
librosaAudio resampling

Frontend

PackagePurpose
next 15.xReact framework
react 19.xUI library
tailwindcss 4.xStyling

:::caution System requirement ffmpeg must be installed on the host for MP3, M4A, AAC, and other compressed-format support. :::

Use cases

  • Standalone audio cleanup for podcasts, recordings, and call center audio
  • Preprocessing before transcription or voice biometrics — improves accuracy
  • In-line with the Voice Agent pipeline

Roadmap

  • Voice Activity Detection (VAD) integration
  • Gain / volume normalization
  • Echo cancellation