Client project

Voxy: real-time dictation into the active Windows application

Key result

Documented validation, shipped installer

Initial situation

The aim was to dictate into whichever app was already in the foreground (browser, email, word processor) instead of making a separate dictation window the main workflow.

Problem

The client needed something that lives quietly in the tray, stays reachable through global shortcuts, and inserts text into the active field without breaking the clipboard or hitting the wrong window.

Solution provided

Windows 10/11 (64-bit) tray app built with .NET 8 and WPF. Global shortcuts (Toggle, Start/Stop, Push-to-Talk), WASAPI capture (16 kHz mono PCM, 16-bit frames at ~20 ms after resampling), a WebSocket “Protocol A” client to a user-configured endpoint, and streaming transcription. Text reaches the foreground window through the clipboard and Ctrl+V (SendInput). In Toggle and Start/Stop, progressive append-only updates run on a timer (about 1 s to the first tick, ~2.5 s between ticks, per the documentation) with a flush when the segment completes. In Push-to-Talk, only finalized segments are injected, which keeps clipboard churn low on short holds. Tray icon state reflects activity; the clipboard is snapshotted and restored for Unicode text; injection stops when Voxy’s own window is focused; the connection test walks the handshake through session_created. Serilog with rotation, JSON settings under AppData. The Windows installer does not bundle a speech server. Documented validation used a local Python/FastAPI proxy in front of a realtime engine; any Protocol A-compatible backend can be configured. Inno Setup installer with optional startup with Windows, built through build-installer.ps1.

Results

  • Shipped v1.0.0 with a self-contained Windows installer, user and technical documentation, and a validation report

  • End-to-end validation and injection tests across Gmail, LinkedIn, WhatsApp Web, Google Docs, generic browser fields, Notepad, and Word

  • Graceful handling when the backend stops mid-dictation. Corrupt settings recovery tested

  • Outside the formal report scope: hot-swap USB microphone while a session is active