Documentation

Run OpenAI Whisper entirely in the browser — no server, no API keys. This guide covers install, required headers, and everyday usage.

Introduction

browser-whisper decodes audio with WebCodecs (via mediabunny) and runs inference with WebGPU (via Transformers.js). If WebGPU or WebCodecs is unavailable, the library falls back to WASM and AudioContext automatically.

Audio moves between two Web Workers over a MessageChannel so the main thread stays responsive. Results stream back as an async iterator, one segment at a time.

Requirements

WebGPU required. Use a Chromium-based browser (Chrome, Edge, Brave, Arc) with these flags enabled:
  • --enable-unsafe-webgpu
  • --enable-features=Vulkan

Your app must also send COOP/COEP headers on every page that loads the library (see below). Without them, SharedArrayBuffer is unavailable and model loading will fail.

Install

npm install browser-whisper
# or
bun add browser-whisper

No peer dependencies — Transformers.js and mediabunny are bundled into the worker at build time.

Server headers

Add these headers on any route that uses browser-whisper:

Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin

Vite

// vite.config.ts
export default defineConfig({
  server: {
    headers: {
      'Cross-Origin-Embedder-Policy': 'require-corp',
      'Cross-Origin-Opener-Policy': 'same-origin',
    },
  },
  preview: {
    headers: {
      'Cross-Origin-Embedder-Policy': 'require-corp',
      'Cross-Origin-Opener-Policy': 'same-origin',
    },
  },
})

Next.js

Set the same headers in next.config.js (or middleware) for routes that load the library. Import browser-whisper only on the client — it uses Web Workers and cannot run during SSR.

// next.config.js
const nextConfig = {
  async headers() {
    return [
      {
        source: '/:path*',
        headers: [
          { key: 'Cross-Origin-Embedder-Policy', value: 'require-corp' },
          { key: 'Cross-Origin-Opener-Policy', value: 'same-origin' },
        ],
      },
    ]
  },
}
module.exports = nextConfig

See the Next.js example for a full app setup.

Quick start

import { BrowserWhisper } from 'browser-whisper'

const whisper = new BrowserWhisper({ model: 'whisper-base' })
const file = document.querySelector('input[type=file]').files[0]

for await (const { text, start, end } of whisper.transcribe(file)) {
  console.log(`[${start.toFixed(1)}s] ${text}`)
}

Collect all segments

const segments = await whisper.transcribe(file).collect()
console.log(segments.map(s => s.text).join(' '))

Pre-download a model

await whisper.downloadModel('whisper-small')

Weights are cached in the browser (OPFS) after the first download.

With callbacks

whisper.transcribe(file, {
  language: 'en',
  onSegment: (seg) => appendToUI(seg),
  onProgress: ({ stage, progress }) => {
    console.log(stage, Math.round(progress * 100) + '%')
  },
})

API

new BrowserWhisper(options?)

OptionDefaultDescription
modelwhisper-baseWhisper model size
languageautoBCP-47 code, e.g. en
quantizationhybridfp32, fp16, q8, q4, or hybrid

whisper.transcribe(file, options?)

Returns a TranscribeStream (async iterable). Override model, language, or quantization per call. Use onSegment and onProgress for callbacks.

whisper.transcribePCM(samples)

Transcribe a mono Float32Array at 16 kHz (e.g. from a VAD pipeline). Skips file decoding.

whisper.downloadModel(model)

Download and cache a model without transcribing. Supports onProgress and AbortSignal.

BrowserWhisper.clearCache() / deleteModel(model)

Remove cached model weights from the browser.

TranscriptSegment

{ text: string, start: number, end: number }

start and end are in seconds from the beginning of the file.

Full API details are in the README on GitHub.

Models

Download sizes use default hybrid quantization (encoder fp32 + decoder q4).

ModelSizeNotes
moonshine-tiny~32 MBEnglish only · fastest
moonshine-base~61 MBEnglish only
whisper-tiny~64 MBMultilingual · fastest Whisper
whisper-base~136 MBMultilingual · default
whisper-small~510 MBMultilingual · better accuracy
whisper-tiny_timestamped~118 MBWord-level timestamps
whisper-base_timestamped~201 MBWord-level timestamps
whisper-small_timestamped~563 MBWord-level timestamps
lite-whisper-large-v3-turbo-fast~1.49 GBLarge · speed-optimized
lite-whisper-large-v3-turbo~1.72 GBLarge · balanced
lite-whisper-large-v3-turbo-acc~1.89 GBLarge · accuracy-optimized
whisper-large-v3-turbo~2.69 GBBest turbo accuracy
whisper-large-v3-turbo_timestamped~2.69 GBTurbo + word timestamps
whisper-large-v3~3.12 GBHighest accuracy
distil-whisper-small~185 MBEnglish only · Distil

Models are fetched from Hugging Face (onnx-community) on first run, then cached locally.

Browser support

WebGPU and WebCodecs are preferred; fallbacks run automatically when needed.

BrowserWebGPUWebCodecs
Chrome113+94+
Firefox141+130+
Safari18+16.4+

First run needs network access for WASM (~1 MB from CDN) and model weights. After caching, transcription works offline.


Try the live demo · All demos · Vite example · Next.js example