Documentation

Run OpenAI Whisper entirely in the browser — no server, no API keys. This guide covers install, required headers, and everyday usage.

Introduction

browser-whisper decodes audio with WebCodecs (via mediabunny) and runs inference with WebGPU (via Transformers.js). If WebGPU or WebCodecs is unavailable, the library falls back to WASM and AudioContext automatically.

Audio moves between two Web Workers over a MessageChannel so the main thread stays responsive. Results stream back as an async iterator, one segment at a time.

Requirements

WebGPU required. Use a Chromium-based browser (Chrome, Edge, Brave, Arc) with these flags enabled:

--enable-unsafe-webgpu
--enable-features=Vulkan

Your app must also send COOP/COEP headers on every page that loads the library (see below). Without them, SharedArrayBuffer is unavailable and model loading will fail.

Install

npm install browser-whisper
# or
bun add browser-whisper

No peer dependencies — Transformers.js and mediabunny are bundled into the worker at build time.

Server headers

Add these headers on any route that uses browser-whisper:

Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin

Vite

// vite.config.ts
export default defineConfig({
  server: {
    headers: {
      'Cross-Origin-Embedder-Policy': 'require-corp',
      'Cross-Origin-Opener-Policy': 'same-origin',
    },
  },
  preview: {
    headers: {
      'Cross-Origin-Embedder-Policy': 'require-corp',
      'Cross-Origin-Opener-Policy': 'same-origin',
    },
  },
})

Next.js

Set the same headers in next.config.js (or middleware) for routes that load the library. Import browser-whisper only on the client — it uses Web Workers and cannot run during SSR.

// next.config.js
const nextConfig = {
  async headers() {
    return [
      {
        source: '/:path*',
        headers: [
          { key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' },
          { key: 'Cross-Origin-Opener-Policy', value: 'same-origin' },
        ],
      },
    ]
  },
}
module.exports = nextConfig

See the Next.js example for a full app setup.

Quick start

import { BrowserWhisper } from 'browser-whisper'

const whisper = new BrowserWhisper({ model: 'whisper-base' })
const file = document.querySelector('input[type=file]').files[0]

for await (const { text, start, end } of whisper.transcribe(file)) {
  console.log(`[${start.toFixed(1)}s] ${text}`)
}

Collect all segments

const segments = await whisper.transcribe(file).collect()
console.log(segments.map(s => s.text).join(' '))

Pre-download a model

await whisper.downloadModel('whisper-small')

Weights are cached in the browser (OPFS) after the first download.

With callbacks

whisper.transcribe(file, {
  language: 'en',
  onSegment: (seg) => appendToUI(seg),
  onProgress: ({ stage, progress }) => {
    console.log(stage, Math.round(progress * 100) + '%')
  },
})

API

`new BrowserWhisper(options?)`

Option	Default	Description
`model`	`whisper-base`	Whisper model size
`language`	auto	BCP-47 code, e.g. `en`
`quantization`	`hybrid`	fp32, fp16, q8, q4, or hybrid

`whisper.transcribe(file, options?)`

Returns a TranscribeStream (async iterable). Override model, language, or quantization per call. Use onSegment and onProgress for callbacks.

`whisper.transcribePCM(samples)`

Transcribe a mono Float32Array at 16 kHz (e.g. from a VAD pipeline). Skips file decoding.

`whisper.downloadModel(model)`

Download and cache a model without transcribing. Supports onProgress and AbortSignal.

`BrowserWhisper.clearCache()` / `deleteModel(model)`

Remove cached model weights from the browser.

`TranscriptSegment`

{ text: string, start: number, end: number }

start and end are in seconds from the beginning of the file.

Full API details are in the README on GitHub.

Models

Download sizes use default hybrid quantization (encoder fp32 + decoder q4).

Model	Size	Notes
`moonshine-tiny`	~32 MB	English only · fastest
`moonshine-base`	~61 MB	English only
`whisper-tiny`	~64 MB	Multilingual · fastest Whisper
`whisper-base`	~136 MB	Multilingual · default
`whisper-small`	~510 MB	Multilingual · better accuracy
`whisper-tiny_timestamped`	~118 MB	Word-level timestamps
`whisper-base_timestamped`	~201 MB	Word-level timestamps
`whisper-small_timestamped`	~563 MB	Word-level timestamps
`lite-whisper-large-v3-turbo-fast`	~1.49 GB	Large · speed-optimized
`lite-whisper-large-v3-turbo`	~1.72 GB	Large · balanced
`lite-whisper-large-v3-turbo-acc`	~1.89 GB	Large · accuracy-optimized
`whisper-large-v3-turbo`	~2.69 GB	Best turbo accuracy
`whisper-large-v3-turbo_timestamped`	~2.69 GB	Turbo + word timestamps
`whisper-large-v3`	~3.12 GB	Highest accuracy
`distil-whisper-small`	~185 MB	English only · Distil

Models are fetched from Hugging Face (onnx-community) on first run, then cached locally.

Browser support

WebGPU and WebCodecs are preferred; fallbacks run automatically when needed.

Browser	WebGPU	WebCodecs
Chrome	113+	94+
Firefox	141+	130+
Safari	18+	16.4+

First run needs network access for WASM (~1 MB from CDN) and model weights. After caching, transcription works offline.

Try the live demo · All demos · Vite example · Next.js example

Documentation

Introduction

Requirements

Install

Server headers

Vite

Next.js

Quick start

Collect all segments

Pre-download a model

With callbacks

API

new BrowserWhisper(options?)

whisper.transcribe(file, options?)

whisper.transcribePCM(samples)

whisper.downloadModel(model)

BrowserWhisper.clearCache() / deleteModel(model)

TranscriptSegment

Models

Browser support

`new BrowserWhisper(options?)`

`whisper.transcribe(file, options?)`

`whisper.transcribePCM(samples)`

`whisper.downloadModel(model)`

`BrowserWhisper.clearCache()` / `deleteModel(model)`

`TranscriptSegment`