Documentation
Run OpenAI Whisper entirely in the browser — no server, no API keys. This guide covers install, required headers, and everyday usage.
Introduction
browser-whisper decodes audio with WebCodecs
(via mediabunny) and runs inference with WebGPU
(via Transformers.js). If WebGPU or WebCodecs is unavailable, the library
falls back to WASM and AudioContext automatically.
Audio moves between two Web Workers over a MessageChannel so the
main thread stays responsive. Results stream back as an async iterator, one
segment at a time.
Requirements
--enable-unsafe-webgpu--enable-features=Vulkan
Your app must also send COOP/COEP headers on every page that loads the library (see below). Without them, SharedArrayBuffer is unavailable and model loading will fail.
Install
npm install browser-whisper
# or
bun add browser-whisper
No peer dependencies — Transformers.js and mediabunny are bundled into the worker at build time.
Server headers
Add these headers on any route that uses browser-whisper:
Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin
Vite
// vite.config.ts
export default defineConfig({
server: {
headers: {
'Cross-Origin-Embedder-Policy': 'require-corp',
'Cross-Origin-Opener-Policy': 'same-origin',
},
},
preview: {
headers: {
'Cross-Origin-Embedder-Policy': 'require-corp',
'Cross-Origin-Opener-Policy': 'same-origin',
},
},
})
Next.js
Set the same headers in next.config.js (or middleware) for routes
that load the library. Import browser-whisper only on the client — it uses
Web Workers and cannot run during SSR.
// next.config.js
const nextConfig = {
async headers() {
return [
{
source: '/:path*',
headers: [
{ key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' },
{ key: 'Cross-Origin-Opener-Policy', value: 'same-origin' },
],
},
]
},
}
module.exports = nextConfig
See the Next.js example for a full app setup.
Quick start
import { BrowserWhisper } from 'browser-whisper'
const whisper = new BrowserWhisper({ model: 'whisper-base' })
const file = document.querySelector('input[type=file]').files[0]
for await (const { text, start, end } of whisper.transcribe(file)) {
console.log(`[${start.toFixed(1)}s] ${text}`)
}
Collect all segments
const segments = await whisper.transcribe(file).collect()
console.log(segments.map(s => s.text).join(' '))
Pre-download a model
await whisper.downloadModel('whisper-small')
Weights are cached in the browser (OPFS) after the first download.
With callbacks
whisper.transcribe(file, {
language: 'en',
onSegment: (seg) => appendToUI(seg),
onProgress: ({ stage, progress }) => {
console.log(stage, Math.round(progress * 100) + '%')
},
})
API
new BrowserWhisper(options?)
| Option | Default | Description |
|---|---|---|
model | whisper-base | Whisper model size |
language | auto | BCP-47 code, e.g. en |
quantization | hybrid | fp32, fp16, q8, q4, or hybrid |
whisper.transcribe(file, options?)
Returns a TranscribeStream (async iterable). Override
model, language, or quantization per call.
Use onSegment and onProgress for callbacks.
whisper.transcribePCM(samples)
Transcribe a mono Float32Array at 16 kHz (e.g. from a VAD pipeline).
Skips file decoding.
whisper.downloadModel(model)
Download and cache a model without transcribing. Supports
onProgress and AbortSignal.
BrowserWhisper.clearCache() / deleteModel(model)
Remove cached model weights from the browser.
TranscriptSegment
{ text: string, start: number, end: number }
start and end are in seconds from the beginning of the file.
Full API details are in the README on GitHub.
Models
Download sizes use default hybrid quantization (encoder fp32 + decoder q4).
| Model | Size | Notes |
|---|---|---|
moonshine-tiny | ~32 MB | English only · fastest |
moonshine-base | ~61 MB | English only |
whisper-tiny | ~64 MB | Multilingual · fastest Whisper |
whisper-base | ~136 MB | Multilingual · default |
whisper-small | ~510 MB | Multilingual · better accuracy |
whisper-tiny_timestamped | ~118 MB | Word-level timestamps |
whisper-base_timestamped | ~201 MB | Word-level timestamps |
whisper-small_timestamped | ~563 MB | Word-level timestamps |
lite-whisper-large-v3-turbo-fast | ~1.49 GB | Large · speed-optimized |
lite-whisper-large-v3-turbo | ~1.72 GB | Large · balanced |
lite-whisper-large-v3-turbo-acc | ~1.89 GB | Large · accuracy-optimized |
whisper-large-v3-turbo | ~2.69 GB | Best turbo accuracy |
whisper-large-v3-turbo_timestamped | ~2.69 GB | Turbo + word timestamps |
whisper-large-v3 | ~3.12 GB | Highest accuracy |
distil-whisper-small | ~185 MB | English only · Distil |
Models are fetched from Hugging Face (onnx-community) on first run,
then cached locally.
Browser support
WebGPU and WebCodecs are preferred; fallbacks run automatically when needed.
| Browser | WebGPU | WebCodecs |
|---|---|---|
| Chrome | 113+ | 94+ |
| Firefox | 141+ | 130+ |
| Safari | 18+ | 16.4+ |
First run needs network access for WASM (~1 MB from CDN) and model weights. After caching, transcription works offline.
Try the live demo · All demos · Vite example · Next.js example