Getting Started
System Requirements
OS
Windows 10 (1903+) or Windows 11
RAM
4 GB min (8 GB recommended)
Disk
200 MB + model files (75 MB – 2.9 GB)
Audio
Any microphone input device
Installation
1.Download
2.Install
3.First model
4.Microphone access
First Transcription
- 1Open any text editor, IDE, or input field.
- 2Press Ctrl+Alt+Space to start recording (or hold Ctrl+Space for push-to-talk).
- 3Speak naturally into your microphone.
- 4Press Ctrl+Alt+Space again (or release Ctrl+Space). Your speech is transcribed locally and pasted at your cursor.
How It Works
SpeakToCode runs entirely on your local machine. No cloud API calls, no audio uploads, no third-party services.
Record
Your microphone captures audio locally as a temporary WAV file.
Transcribe
Audio is processed by Whisper.cpp running natively on your CPU/GPU.
Paste
Transcribed text is copied to clipboard and auto-pasted at your cursor.
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
| Ctrl+Alt+Space | Toggle recording on/off |
| Ctrl+Space | Push-to-talk: hold to record, release to transcribe |
Shortcuts work globally from any application. Customize them in Settings.
Recording Modes
Toggle Mode
Press Ctrl+Alt+Space to start. Press again to stop and transcribe. Best for longer dictation sessions.
Push-to-Talk
Hold Ctrl+Space while speaking. Release to transcribe. Best for quick bursts like variable names, commit messages, short comments.
Whisper Models
SpeakToCode supports all 7 official Whisper models. Larger models are more accurate but slower.
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| Tiny | 75 MB | ~1s | Good |
| Base | 142 MB | ~1.5s | Better |
| Small | 466 MB | ~3s | Great |
| Medium | 1.5 GB | ~7s | Excellent |
| Large-v1 | 2.9 GB | ~15s | Best |
| Large-v2 | 2.9 GB | ~15s | Best |
| Large-v3 | 2.9 GB | ~15s | Best |
Which model should I use?
- Tiny / Base Quick dictation, low latency. Ideal for limited RAM or when speed matters most.
- Small Best balance of speed and accuracy. Recommended starting point.
- Medium Excellent accuracy. Best for 8+ GB RAM.
- Large (v1/v2/v3) Maximum accuracy, slower. For 16+ GB RAM. v3 is the latest.
Switch models anytime in Settings → Model. Models download once and are cached locally.
Settings
Access settings via the gear icon in the sidebar or the Settings page in the app.
Audio
- Input Device: Choose your microphone (defaults to system default)
- Silence Detection: Auto-stop recording after a silence period
Model
- Whisper Model: Select transcription model size
- Language: Set expected language or leave on Auto
Hotkeys
- Toggle Shortcut: Customize toggle recording key (default: Ctrl+Alt+Space)
- Push-to-Talk: Customize push-to-talk key (default: Ctrl+Space)
General
- Launch at Startup: Start with Windows
- Minimize to Tray: Keep running in system tray
- Theme: Light, dark, or system preference
- Cloud Sync (Pro): Sync across devices
Integrations
SpeakToCode works with any application that accepts text input. It pastes transcribed text at your cursor via the system clipboard.
IDEs
VS Code, JetBrains, Vim/Neovim, Sublime Text, Notepad++
Terminals
Windows Terminal, PowerShell, CMD, WSL
Browsers
Chrome, Edge, Firefox, any text field
Communication
Slack, Discord, Teams, Outlook
Productivity
Notion, Google Docs, Word, Obsidian
Privacy & Security
Privacy is a core design principle, not an afterthought.
100% local processing
All transcription happens on your machine. Audio is never sent to any server.
No telemetry
No usage analytics, crash reports, or behavioral data collected.
No cloud dependency
Works fully offline. Internet only needed for model downloads and optional sync.
Local storage
History and audio stored in %LOCALAPPDATA%\SpeakToCode\ on your machine.
Open source engine
Whisper.cpp is open source and auditable.
For full details, see our Privacy Policy.
Troubleshooting
Microphone not detected
- Check mic is plugged in and set as default in Windows Sound Settings.
- Verify SpeakToCode has mic permission: Windows Settings → Privacy → Microphone.
- Try selecting a specific input device in Settings → Audio.
Poor transcription accuracy
- Try a larger model (Small or Medium for significant improvement).
- Reduce background noise or use a noise-canceling mic.
- Speak clearly. Fast speech or mumbling reduces accuracy.
- Set language explicitly rather than using auto-detect.
High CPU usage during transcription
- This is normal. Whisper uses significant CPU during transcription and returns to idle after.
- Use a smaller model (Tiny or Base) to reduce load.
- Keep recordings under 30 seconds.
Model download fails
- Check your internet connection. Downloads from Hugging Face may be blocked by firewalls.
- Delete partial downloads in %LOCALAPPDATA%\SpeakToCode\models\ and retry.
Text not pasting into target application
- Make sure the target window is focused before starting transcription.
- Some apps with custom paste handling may need their own paste menu.
- Check that your clipboard isn't locked by another application.
Windows SmartScreen warning
- Click "More info" then "Run anyway". The app is safe. Code signing is coming soon.
FAQ
Is SpeakToCode free?
Yes! The free plan gives you 20 transcriptions/day with Tiny and Base models. The Pro plan ($7/mo) unlocks unlimited transcriptions, all 7 models, cloud sync, and AI text cleanup.
Does it work offline?
Yes. Once you've downloaded a model, SpeakToCode works completely offline.
What languages are supported?
Whisper supports 99+ languages. Set your preferred language in Settings for best results.
Can I use it for coding?
Yes! It works with all major IDEs and editors. Dictate comments, docs, variable names, commit messages, and more.
Does it support macOS or Linux?
Currently Windows-only. macOS and Linux are planned. Follow our changelog for updates.
How do I uninstall?
Windows Settings → Apps → Installed apps, find SpeakToCode, click Uninstall. To remove local data too, delete %LOCALAPPDATA%\SpeakToCode\.
Where can I get help?
Discord community, GitHub Issues, or contact us directly.