AI/ML★ 4.7k

Whisper STT — Local Speech-to-Text Transcription

Transcribe audio to text locally using OpenAI's Whisper model. Process meetings, podcasts, voice memos, and interviews — 100% offline, no cloud API needed.

Offline Transcription Power

OpenAI's Whisper is the gold standard for speech-to-text. This MCP runs it entirely locally — no cloud API, no subscription, no data leaving your machine. Perfect for confidential meetings, medical notes, or legal recordings.

99 Languages: Whisper supports transcription and translation for 99 languages.
Multiple Models: tiny (fastest) → large-v3 (most accurate). Choose based on your hardware.
Speaker Diarization: Identifies who said what in multi-speaker recordings.
Timestamp Output: Word-level and segment-level timestamps for precise alignment.

Configuration

"mcpServers": {
  "whisper": {
    "command": "npx",
    "args": ["-y", "@mcp/whisper-server"],
    "env": { "WHISPER_MODEL": "large-v3" }
  }
}

Top Prompts

"Transcribe meeting.mp3 and create action items from what was discussed."
"Convert this podcast episode to text and summarize the key points."
"Transcribe all voice memos from today and add them to my daily notes."

Hardware Requirements

Model	VRAM	Speed	Accuracy
tiny	1 GB	10x realtime	Good
base	1.5 GB	7x realtime	Better
medium	5 GB	2x realtime	Great
large-v3	10 GB	1x realtime	Best