AI/MLβ˜… 4.7k

Whisper STT β€” Local Speech-to-Text Transcription

Transcribe audio to text locally using OpenAI's Whisper model. Process meetings, podcasts, voice memos, and interviews β€” 100% offline, no cloud API needed.

Offline Transcription Power

OpenAI's Whisper is the gold standard for speech-to-text. This MCP runs it entirely locally β€” no cloud API, no subscription, no data leaving your machine. Perfect for confidential meetings, medical notes, or legal recordings.

  • 99 Languages: Whisper supports transcription and translation for 99 languages.
  • Multiple Models: tiny (fastest) β†’ large-v3 (most accurate). Choose based on your hardware.
  • Speaker Diarization: Identifies who said what in multi-speaker recordings.
  • Timestamp Output: Word-level and segment-level timestamps for precise alignment.

Configuration

"mcpServers": {
  "whisper": {
    "command": "npx",
    "args": ["-y", "@mcp/whisper-server"],
    "env": { "WHISPER_MODEL": "large-v3" }
  }
}

Top Prompts

  1. "Transcribe meeting.mp3 and create action items from what was discussed."
  2. "Convert this podcast episode to text and summarize the key points."
  3. "Transcribe all voice memos from today and add them to my daily notes."

Hardware Requirements

ModelVRAMSpeedAccuracy
tiny1 GB10x realtimeGood
base1.5 GB7x realtimeBetter
medium5 GB2x realtimeGreat
large-v310 GB1x realtimeBest
$ cd ../* END_OF_FILE */