AI/MLβ
4.7k
Whisper STT β Local Speech-to-Text Transcription
Transcribe audio to text locally using OpenAI's Whisper model. Process meetings, podcasts, voice memos, and interviews β 100% offline, no cloud API needed.
Offline Transcription Power
OpenAI's Whisper is the gold standard for speech-to-text. This MCP runs it entirely locally β no cloud API, no subscription, no data leaving your machine. Perfect for confidential meetings, medical notes, or legal recordings.
- 99 Languages: Whisper supports transcription and translation for 99 languages.
- Multiple Models: tiny (fastest) β large-v3 (most accurate). Choose based on your hardware.
- Speaker Diarization: Identifies who said what in multi-speaker recordings.
- Timestamp Output: Word-level and segment-level timestamps for precise alignment.
Configuration
"mcpServers": {
"whisper": {
"command": "npx",
"args": ["-y", "@mcp/whisper-server"],
"env": { "WHISPER_MODEL": "large-v3" }
}
}Top Prompts
- "Transcribe meeting.mp3 and create action items from what was discussed."
- "Convert this podcast episode to text and summarize the key points."
- "Transcribe all voice memos from today and add them to my daily notes."
Hardware Requirements
| Model | VRAM | Speed | Accuracy |
|---|---|---|---|
| tiny | 1 GB | 10x realtime | Good |
| base | 1.5 GB | 7x realtime | Better |
| medium | 5 GB | 2x realtime | Great |
| large-v3 | 10 GB | 1x realtime | Best |