SpeakToCodeSpeakToCode
SpeakToCodeSpeakToCode

Documentation

Everything you need to get started with SpeakToCode and make the most of voice-to-text dictation.

Getting Started

System Requirements

OS

Windows 10 (1903+) or Windows 11

RAM

4 GB min (8 GB recommended)

Disk

200 MB + model files (75 MB – 2.9 GB)

Audio

Any microphone input device

Installation

1.Download

Download the latest installer directly from our website.

2.Install

Run SpeakToCode-Setup.exe. Windows SmartScreen may prompt you. Click More info then Run anyway.

3.First model

On first launch, the Tiny Whisper model (~75 MB) downloads automatically. This only happens once.

4.Microphone access

Grant microphone permission when Windows prompts you.

First Transcription

  1. 1Open any text editor, IDE, or input field.
  2. 2Press Ctrl+Alt+Space to start recording (or hold Ctrl+Space for push-to-talk).
  3. 3Speak naturally into your microphone.
  4. 4Press Ctrl+Alt+Space again (or release Ctrl+Space). Your speech is transcribed locally and pasted at your cursor.

How It Works

SpeakToCode runs entirely on your local machine. No cloud API calls, no audio uploads, no third-party services.

Record

Your microphone captures audio locally as a temporary WAV file.

Transcribe

Audio is processed by Whisper.cpp running natively on your CPU/GPU.

Paste

Transcribed text is copied to clipboard and auto-pasted at your cursor.

The entire pipeline completes in 1–15 seconds depending on the model and recording length.

Keyboard Shortcuts

ShortcutAction
Ctrl+Alt+SpaceToggle recording on/off
Ctrl+SpacePush-to-talk: hold to record, release to transcribe

Shortcuts work globally from any application. Customize them in Settings.

Recording Modes

Toggle Mode

Press Ctrl+Alt+Space to start. Press again to stop and transcribe. Best for longer dictation sessions.

Push-to-Talk

Hold Ctrl+Space while speaking. Release to transcribe. Best for quick bursts like variable names, commit messages, short comments.

Both modes show a floating microphone indicator so you always know when SpeakToCode is listening.

Whisper Models

SpeakToCode supports all 7 official Whisper models. Larger models are more accurate but slower.

ModelSizeSpeedAccuracy
Tiny75 MB~1sGood
Base142 MB~1.5sBetter
Small466 MB~3sGreat
Medium1.5 GB~7sExcellent
Large-v12.9 GB~15sBest
Large-v22.9 GB~15sBest
Large-v32.9 GB~15sBest

Which model should I use?

  • Tiny / Base Quick dictation, low latency. Ideal for limited RAM or when speed matters most.
  • Small Best balance of speed and accuracy. Recommended starting point.
  • Medium Excellent accuracy. Best for 8+ GB RAM.
  • Large (v1/v2/v3) Maximum accuracy, slower. For 16+ GB RAM. v3 is the latest.

Switch models anytime in Settings → Model. Models download once and are cached locally.

Settings

Access settings via the gear icon in the sidebar or the Settings page in the app.

Audio

  • Input Device: Choose your microphone (defaults to system default)
  • Silence Detection: Auto-stop recording after a silence period

Model

  • Whisper Model: Select transcription model size
  • Language: Set expected language or leave on Auto

Hotkeys

  • Toggle Shortcut: Customize toggle recording key (default: Ctrl+Alt+Space)
  • Push-to-Talk: Customize push-to-talk key (default: Ctrl+Space)

General

  • Launch at Startup: Start with Windows
  • Minimize to Tray: Keep running in system tray
  • Theme: Light, dark, or system preference
  • Cloud Sync (Pro): Sync across devices

Integrations

SpeakToCode works with any application that accepts text input. It pastes transcribed text at your cursor via the system clipboard.

IDEs

VS Code, JetBrains, Vim/Neovim, Sublime Text, Notepad++

Terminals

Windows Terminal, PowerShell, CMD, WSL

Browsers

Chrome, Edge, Firefox, any text field

Communication

Slack, Discord, Teams, Outlook

Productivity

Notion, Google Docs, Word, Obsidian

IDE tip: Place your cursor where you want text, then use push-to-talk for variable names, commit messages, and short comments.

Privacy & Security

Privacy is a core design principle, not an afterthought.

100% local processing

All transcription happens on your machine. Audio is never sent to any server.

No telemetry

No usage analytics, crash reports, or behavioral data collected.

No cloud dependency

Works fully offline. Internet only needed for model downloads and optional sync.

Local storage

History and audio stored in %LOCALAPPDATA%\SpeakToCode\ on your machine.

Open source engine

Whisper.cpp is open source and auditable.

For full details, see our Privacy Policy.

Troubleshooting

Microphone not detected
  • Check mic is plugged in and set as default in Windows Sound Settings.
  • Verify SpeakToCode has mic permission: Windows Settings → Privacy → Microphone.
  • Try selecting a specific input device in Settings → Audio.
Poor transcription accuracy
  • Try a larger model (Small or Medium for significant improvement).
  • Reduce background noise or use a noise-canceling mic.
  • Speak clearly. Fast speech or mumbling reduces accuracy.
  • Set language explicitly rather than using auto-detect.
High CPU usage during transcription
  • This is normal. Whisper uses significant CPU during transcription and returns to idle after.
  • Use a smaller model (Tiny or Base) to reduce load.
  • Keep recordings under 30 seconds.
Model download fails
  • Check your internet connection. Downloads from Hugging Face may be blocked by firewalls.
  • Delete partial downloads in %LOCALAPPDATA%\SpeakToCode\models\ and retry.
Text not pasting into target application
  • Make sure the target window is focused before starting transcription.
  • Some apps with custom paste handling may need their own paste menu.
  • Check that your clipboard isn't locked by another application.
Windows SmartScreen warning
  • Click "More info" then "Run anyway". The app is safe. Code signing is coming soon.

FAQ

Is SpeakToCode free?

Yes! The free plan gives you 20 transcriptions/day with Tiny and Base models. The Pro plan ($7/mo) unlocks unlimited transcriptions, all 7 models, cloud sync, and AI text cleanup.

Does it work offline?

Yes. Once you've downloaded a model, SpeakToCode works completely offline.

What languages are supported?

Whisper supports 99+ languages. Set your preferred language in Settings for best results.

Can I use it for coding?

Yes! It works with all major IDEs and editors. Dictate comments, docs, variable names, commit messages, and more.

Does it support macOS or Linux?

Currently Windows-only. macOS and Linux are planned. Follow our changelog for updates.

How do I uninstall?

Windows Settings → Apps → Installed apps, find SpeakToCode, click Uninstall. To remove local data too, delete %LOCALAPPDATA%\SpeakToCode\.

Where can I get help?