Voxy: real-time dictation into the active Windows application
Key result
Documented validation, shipped installer
Initial situation
The aim was to dictate into whichever app was already in the foreground (browser, email, word processor) instead of making a separate dictation window the main workflow.
Problem
The client needed something that lives quietly in the tray, stays reachable through global shortcuts, and inserts text into the active field without breaking the clipboard or hitting the wrong window.
Solution provided
Windows 10/11 (64-bit) tray app built with .NET 8 and WPF. Global shortcuts (Toggle, Start/Stop, Push-to-Talk), WASAPI capture (16 kHz mono PCM, 16-bit frames at ~20 ms after resampling), a WebSocket “Protocol A” client to a user-configured endpoint, and streaming transcription. Text reaches the foreground window through the clipboard and Ctrl+V (SendInput). In Toggle and Start/Stop, progressive append-only updates run on a timer (about 1 s to the first tick, ~2.5 s between ticks, per the documentation) with a flush when the segment completes. In Push-to-Talk, only finalized segments are injected, which keeps clipboard churn low on short holds. Tray icon state reflects activity; the clipboard is snapshotted and restored for Unicode text; injection stops when Voxy’s own window is focused; the connection test walks the handshake through session_created. Serilog with rotation, JSON settings under AppData. The Windows installer does not bundle a speech server. Documented validation used a local Python/FastAPI proxy in front of a realtime engine; any Protocol A-compatible backend can be configured. Inno Setup installer with optional startup with Windows, built through build-installer.ps1.
Results
Shipped v1.0.0 with a self-contained Windows installer, user and technical documentation, and a validation report
End-to-end validation and injection tests across Gmail, LinkedIn, WhatsApp Web, Google Docs, generic browser fields, Notepad, and Word
Graceful handling when the backend stops mid-dictation. Corrupt settings recovery tested
Outside the formal report scope: hot-swap USB microphone while a session is active