Noise Suppression

Removes background noise from audio — file-based or real-time over a microphone stream. Powered by DeepFilterNet3, a state-of-the-art deep learning noise suppression model.

Live at voiceai.trouve.works/noise/.

Two operating modes

File-based denoising

Upload any audio file (WAV, MP3, OGG, FLAC, M4A, AAC, WebM, Opus, AIFF, WMA — 11+ formats)
Receive cleaned audio in your choice of output format (WAV, MP3, OGG, FLAC)
Upload limit: 50 MB per file
Preserves original sample rate by default

Real-time streaming denoising

Microphone connects through the browser
Audio streams to the server via WebSocket, gets denoised by DeepFilterNet3, and plays back clean audio in real time
Latency: ~150 ms (70 ms model inference + 80 ms jitter buffer)
Native 48 kHz processing — matches standard browser audio sample rates

Key capabilities

Feature	Details
Model	DeepFilterNet3 — GPU-accelerated, stateless processing
Input formats	11+ formats (anything ffmpeg supports)
Output formats	WAV, MP3, OGG, FLAC
Real-time latency	~150 ms end-to-end
Native sample rate	48 kHz
GPU acceleration	NVIDIA CUDA support
Stateless	No per-connection state; single model serves unlimited concurrent users

Project structure

NoiseSuppression/
├── backend/                          # FastAPI server (port 8060)
│   ├── main.py                       # App setup, CORS, lifespan, routers
│   ├── requirements.txt
│   ├── .env.example                  # Config template
│   ├── routers/
│   │   ├── file_processing.py        # POST /api/denoise/file
│   │   └── realtime.py               # WebSocket /ws/denoise
│   └── services/
│       ├── deepfilter_engine.py      # DeepFilterNet3 GPU inference wrapper
│       └── audio_converter.py        # Format conversion (pydub + soundfile)
├── frontend/                         # Next.js 15 UI (port 8061)
│   ├── app/page.tsx                  # Root page
│   ├── hooks/
│   │   ├── useFileDenoiser.ts        # File upload with XHR progress
│   │   ├── useRealtimeDenoiser.ts    # WebSocket + AudioWorklet
│   │   └── useAudioRecorder.ts       # Microphone recording
│   ├── components/
│   │   ├── NoiseSuppressor.tsx       # Root container
│   │   ├── file-mode/                # DropZone, RecorderPanel, WaveformViewer
│   │   └── realtime-mode/            # MicButton, LevelMeter
│   └── public/worklets/
│       └── pcm-processor.js          # AudioWorklet for playback buffering
└── realtime/                         # Standalone real-time-only UI (optional)

Backend architecture

Lifespan: DeepFilterNet3 model is loaded once at startup and shared across all requests.

`DeepFilterEngine` (`services/deepfilter_engine.py`)

Wraps deepfilternet library (df.init_df(), df.enhance())
Model weights on GPU (~50 MB), audio tensors on CPU
Native sample rate: 48,000 Hz
Stateless: each chunk processed independently
Configurable GPU via GPU_DEVICE environment variable

`AudioConverter` (`services/audio_converter.py`)

Fast path: soundfile for WAV, FLAC, OGG, AIFF
Fallback: pydub + ffmpeg for any other format (MP3, M4A, AAC, WebM, etc.)
Handles resampling (librosa / scipy) to/from 48 kHz
Exports to WAV, MP3, OGG, FLAC

API reference

Base URL: https://voiceai.trouve.works/noise/api

File-based denoising

POST /api/denoise/file
Content-Type: multipart/form-data

Parameters:
  file: <audio file>           (required, max 50MB)
  output_format: "wav"         (wav | mp3 | ogg | flac, default: wav)
  restore_sample_rate: true    (bool, default: true — restore original SR)

Response: binary audio data with Content-Disposition header.

Processing pipeline:

Decode audio (any format) to float32 mono
Resample to 48 kHz if needed
Process through DeepFilterNet3 (GPU)
Resample back to original sample rate (if restore_sample_rate=true)
Encode to requested output format

Supported formats

GET /api/denoise/formats

{
  "input": ["wav", "mp3", "ogg", "flac", "m4a", "aac", "webm", "aiff", "opus", "wma"],
  "output": ["wav", "mp3", "ogg", "flac"]
}

Real-time WebSocket

WebSocket: wss://voiceai.trouve.works/noise/ws/denoise

1. HANDSHAKE (client sends text frame):
   {"sample_rate": 48000}

   Server responds:
   {"status": "ready", "sample_rate": 48000}

2. STREAMING (binary frames, bidirectional):
   Client sends: Float32 LE PCM chunks (any size)
   Server returns: Denoised Float32 LE PCM at same sample rate

If client sample rate ≠ 48 kHz, the server transparently resamples in both directions.

Health check

GET /health

{
  "status": "ok",
  "deepfilter_device": "cuda:2"
}

Frontend audio architecture

File mode pipeline

File / Recording  →  AudioContext.decodeAudioData()  →  XHR POST /api/denoise/file
                                                              │
Canvas waveform   ←  AudioContext.decodeAudioData()  ←   Response blob

Real-time pipeline

Microphone (getUserMedia)
  → MediaStreamAudioSourceNode
  → AnalyserNode  [frequency bars for level meter]
  → ScriptProcessorNode (4096 samples, 48 kHz)
  → WebSocket binary frames (Float32 LE PCM)
  → Server (DeepFilterNet3 GPU inference)
  → WebSocket response frames
  → AudioWorklet (pcm-processor.js, ring buffer, 7200-sample jitter buffer)
  → AudioDestinationNode (speakers)

PCM processor worklet:

Ring buffer accumulates WebSocket chunks
Jitter buffer: 7200 samples (~150 ms at 48 kHz) before playback begins
Outputs 128-sample frames on demand
Zero-fills on underrun
Reports buffer level via message port

Configuration

Backend (`.env`)

Variable	Default	Purpose
`HOST`	`0.0.0.0`	Bind address
`PORT`	`8060`	Listen port
`GPU_DEVICE`	`cuda:0`	GPU device (`cuda:N` or `cpu`)
`MAX_UPLOAD_SIZE_MB`	`50`	Max file upload size
`CORS_ORIGINS`	`http://localhost:8061`	Allowed origins

Frontend (`.env.local`)

Variable	Purpose
`NEXT_PUBLIC_API_URL`	Backend API base URL
`NEXT_PUBLIC_WS_URL`	WebSocket server URL
`NEXT_PUBLIC_BASE_PATH`	Sub-path for production deployment

Key dependencies

Backend

Package	Purpose
`fastapi >= 0.111.0`	Web framework
`websockets >= 12.0`	WebSocket support
`deepfilternet`	DeepFilterNet3 model
`torch` + `torchaudio`	ML framework (CUDA)
`soundfile`	Lossless audio I/O
`pydub`	Universal audio decoder (ffmpeg)
`librosa`	Audio resampling

Frontend

Package	Purpose
`next` 15.x	React framework
`react` 19.x	UI library
`tailwindcss` 4.x	Styling

:::caution System requirement ffmpeg must be installed on the host for MP3, M4A, AAC, and other compressed-format support. :::

Use cases

Standalone audio cleanup for podcasts, recordings, and call center audio
Preprocessing before transcription or voice biometrics — improves accuracy
In-line with the Voice Agent pipeline

Roadmap

Voice Activity Detection (VAD) integration
Gain / volume normalization
Echo cancellation

Two operating modes​

File-based denoising​

Real-time streaming denoising​

Key capabilities​

Project structure​

Backend architecture​

DeepFilterEngine (services/deepfilter_engine.py)​

AudioConverter (services/audio_converter.py)​

API reference​

File-based denoising​

Supported formats​

Real-time WebSocket​

Health check​

Frontend audio architecture​

File mode pipeline​

Real-time pipeline​

Configuration​

Backend (.env)​

Frontend (.env.local)​

Key dependencies​

Backend​

Frontend​

Use cases​

Roadmap​

Two operating modes

File-based denoising

Real-time streaming denoising

Key capabilities

Project structure

Backend architecture

`DeepFilterEngine` (`services/deepfilter_engine.py`)

`AudioConverter` (`services/audio_converter.py`)

API reference

File-based denoising

Supported formats

Real-time WebSocket

Health check

Frontend audio architecture

File mode pipeline

Real-time pipeline

Configuration

Backend (`.env`)

Frontend (`.env.local`)

Key dependencies

Backend

Frontend

Use cases

Roadmap