Zeraku
← Back to Tools
🎙️
🌐 Runs in BrowserEst. 1–3× real-timeNEW

Transcription Audio

Transcrivez l'audio en texte avec une précision niveau Whisper — gratuit et privé

Try It Now — Free

What is Transcription Audio?

Convert speech to text with high accuracy using a browser-based Whisper model. Supports 99 languages, generates timestamped transcripts, and exports to SRT, VTT, or plain text. No upload, no account.

Key Features

Powered by OpenAI Whisper (tiny/base/small models in WebAssembly)

Supports 99 languages with automatic language detection

Word-level timestamps for precise navigation

Export to SRT, VTT, and plain TXT formats

Speaker diarization (beta)

Real-time transcription for microphone input

Edit and correct transcript in-browser

Upload MP3, WAV, M4A, FLAC, OGG, WebM up to 500 MB

How It Works

1

Load Model

On first use, the Whisper model (approx. 150 MB) is downloaded once and cached in your browser.

2

Upload Audio

Drop your audio file or record directly from your microphone.

3

Transcription

The model processes your audio locally in chunks, producing timestamped text segments.

4

Review & Export

Read, edit, and search your transcript, then export in your preferred format.

Who Is This For?

  • Journalists transcribing interviews
  • Podcasters creating show notes and subtitles
  • Students transcribing lectures
  • Content creators generating captions for videos
  • Researchers transcribing qualitative data

Why Use Transcription Audio?

Cloud transcription services like Otter.ai and Descript send your audio to remote servers — a serious privacy risk for confidential meetings, medical consultations, or personal recordings. Zeraku's Audio Transcription runs the entire Whisper model inside your browser using WebAssembly. Your recordings never leave your device. Unlike cloud tools that charge per minute or lock features behind subscriptions, Zeraku is completely free with no account, no usage limits, and no file-size cap beyond 500 MB. Need to transcribe offline? After the first model download (approx. 150 MB) the tool works fully offline from the second visit onwards — reliable even on a train or plane. Typical use cases include: creating meeting minutes from Zoom or Google Meet recordings; generating SRT subtitle files for YouTube uploads; transcribing interviews for journalism or qualitative research; converting lectures and seminars to searchable text. Powered by OpenAI Whisper — the same model behind industry-leading transcription services — supporting 99 languages with automatic language detection.

How Zeraku Compares to Cloud Services

Most transcription tools send your audio to remote servers — a serious privacy risk for confidential meetings, medical consultations, or personal recordings. Zeraku processes everything in your browser so your data never leaves your device.

ZerakuService AService B
Completely freeUp to 3/monthUp to 10 min
Privacy (no data upload)Browser-onlyServer uploadServer upload
No account requiredRequiredRequired
Works offline (2nd visit+)Cached modelAlways onlineAlways online
Supported languages99言語Auto-detect58言語100言語+Paid plan only
SRT/VTT subtitle exportFreePaid plan onlyPaid plan only
Speaker diarizationBetaPaid plan onlyPaid plan only
Max file size (free)500MB25MBFree plan100MBFree plan

Beginner's Guide

Audio transcription converts spoken words into written text automatically. It is useful for turning meeting recordings into minutes, adding subtitles to videos, writing up interviews, or summarising lectures. To use it: upload your audio file (MP3, WAV, M4A, or FLAC), select the spoken language, and press Start. The tool transcribes the audio and lets you download the result as a text file (.txt) or a subtitle file (.srt) that can be uploaded directly to YouTube. No technical knowledge is required.

Technical Details

Powered by whisper.cpp compiled to WebAssembly via Emscripten, running inside a dedicated Web Worker so the UI stays fully responsive during transcription. Audio decoding uses the Web Audio API; input files are normalised to 16 kHz mono PCM before processing regardless of source format. Long recordings are automatically chunked into 30-second sliding windows with 1-second overlap to ensure seamless output across boundaries. The Whisper small model (~150 MB) is fetched once on first use and stored in the browser Cache API, enabling fully offline operation from the second session onwards. Speaker diarisation (beta) applies a lightweight spectral clustering algorithm on audio embeddings to label speaker turns. Output is available as plain text (.txt) or SubRip (.srt) with millisecond-accurate timestamps.

Frequently Asked Questions

Related Tools

Ready to try Transcription Audio?

Open Transcription Audio — Free