This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
VoiceFlow is a cross-platform voice-to-text paste utility built with Pyloid (Python desktop framework using PySide6/Qt WebEngine) and React. Users hold a hotkey to record audio, release to transcribe using faster-whisper, and the text is automatically pasted at the cursor. Supports Windows, Linux (Wayland/X11), and macOS.
# Initial setup (installs both Node and Python dependencies)
pnpm run setup
# Development mode (runs Vite frontend + Pyloid backend concurrently)
pnpm run dev
# Development with hot-reload for Python changes
pnpm run dev:watch
# Build desktop application
pnpm run build
# Build platform installers (run on each target OS)
pnpm run build:installer # Windows (.exe via Inno Setup)
pnpm run build:installer:linux # Linux (.tar.gz + .AppImage)
pnpm run build:installer:macos # macOS (.dmg)
# Run Python tests
cd VoiceFlow && uv run -p .venv pytest src-pyloid/tests/
# Run single test file
uv run -p .venv pytest src-pyloid/tests/test_transcription.py -v
# Run frontend only (for UI development)
pnpm run vite
# Lint frontend
pnpm run lintPython backend using Pyloid framework with PySide6:
- main.py - Application entry point. Creates Pyloid app, tray icon, main dashboard window, and recording popup window. Sets up UI callbacks connecting backend events to popup state changes.
- server.py - RPC server using
PyloidRPC. Exposes methods (get_settings,update_settings,get_history, etc.) that frontend calls viapyloid-jsRPC. - app_controller.py - Singleton controller orchestrating all services. Handles hotkey activate/deactivate flow: start recording -> stop recording -> transcribe -> paste at cursor -> save to history.
Services (src-pyloid/services/):
audio.py- Microphone recording using sounddevice, streams amplitude for visualizertranscription.py- faster-whisper model loading and transcriptionhotkey.py- Global hotkey listener using keyboard libraryclipboard.py- Clipboard operations and paste-at-cursor using pyautoguisettings.py- Settings management with defaultsdatabase.py- SQLite database for settings and history (stored at ~/.VoiceFlow/VoiceFlow.db)logger.py- Domain-based logging with hybrid format[timestamp] [LEVEL] [domain] message | {json}. Supports domains: model, audio, hotkey, settings, database, clipboard, window. Configured with 100MB log rotation.model_manager.py- Whisper model download/cache management using huggingface_hub. Provides download progress tracking (percent, speed, ETA), cancellation via CancelToken, daemon thread execution, andclear_cache()to delete only VoiceFlow's faster-whisper models.
React 18 + TypeScript + Vite frontend:
- App.tsx - Hash-based routing between
/popup,/onboarding, and/dashboard. Checks model cache on startup and shows recovery modal if model is missing. - lib/api.ts - RPC wrapper using
pyloid-jsto call Python backend methods. Includes model management APIs (getModelInfo,startModelDownload,cancelModelDownload). - lib/types.ts - TypeScript interfaces for Settings, HistoryEntry, Stats, Options, ModelInfo, DownloadProgress
- pages/ - Popup (recording indicator), Onboarding (includes model download step), Dashboard
- components/ - Feature components plus shadcn/ui components in
components/ui/ModelDownloadProgress.tsx- Download progress UI with progress bar, speed, ETA, and retry supportModelDownloadModal.tsx- Dialog wrapper for model downloads triggered from settingsModelRecoveryModal.tsx- Startup modal for missing model recovery
The frontend uses pyloid-js RPC to call Python methods:
import { rpc } from "pyloid-js";
const settings = await rpc.call("get_settings");Backend sends events to popup window via:
popup_window.invoke('popup-state', {'state': 'recording'})- User holds hotkey (configurable, default Ctrl+Win)
HotkeyService.on_activate->AppController._handle_hotkey_activate->AudioService.start_recording- Popup transitions to "recording" state, shows amplitude visualizer
- User releases hotkey
AudioService.stop_recordingreturns audio numpy arrayTranscriptionService.transcriberuns faster-whisperClipboardService.paste_at_cursorpastes text- History saved to database
- Popup returns to "idle" state
The keyboard library runs hotkey callbacks in a separate thread, but Qt requires UI operations on the main thread. The solution uses Qt signals/slots:
ThreadSafeSignalsclass inmain.pydefines signals (recording_started, recording_stopped, etc.)- Callback functions from
AppControlleremit signals instead of directly updating UI - Signals connect to slot functions with
Qt.QueuedConnectionto ensure they run on the main thread - Slot functions safely update popup state and window properties
For transparent popup windows on Windows:
- Set
transparent=Truewhen creating the window - Call
qwindow.setAttribute(Qt.WA_TranslucentBackground, True)on the Qt window - Set
webview.page().setBackgroundColor(QColor(0, 0, 0, 0))before loading URL - Re-apply
WA_TranslucentBackgroundafter anysetWindowFlags()call
- App/Onboarding/Settings triggers model download via
startModelDownload(modelName) ModelManagercreates daemon thread, startshuggingface_hub.snapshot_download()- Custom tqdm class captures progress, sends updates via callback (throttled to 10/sec)
- Frontend receives
download-progressevents with percent, speed, ETA - User can cancel via
cancelModelDownload()which sets CancelToken - On completion, model is cached in huggingface cache directory
- Turbo model uses
mobiuslabsgmbh/faster-whisper-large-v3-turbo(same as faster-whisper internal mapping)
- Singleton controller:
get_controller()returns singletonAppControllerinstance - UI callbacks: Backend notifies frontend of state changes via callbacks set in
set_ui_callbacks() - Thread-safe signals: Qt signals with
QueuedConnectionmarshal UI updates from background threads to main thread - Background threads: Model loading, downloads, and transcription run in daemon threads
- Domain logging: All services use
get_logger(domain)for structured logging with domains likemodel,audio,hotkey, etc. - Custom hotkeys: Supports modifier-only combos (e.g., Ctrl+Win) and standard combos (e.g., Ctrl+R). Frontend captures keys, backend validates and registers.
- Path alias: Frontend uses
@/forsrc/imports (configured in tsconfig.json and vite.config.ts) - Lazy pyautogui:
ClipboardServicelazy-loads pyautogui via_get_pyautogui()to avoidmouseinfo'ssys.exit()when tkinter is missing in bundled builds - Reduced effects on Linux:
App.tsxaddsreduced-effectsclass on Linux to disablebackdrop-filter, animated orbs, and noise texture that cause sluggish UI under Qt WebEngine software rendering. Popup route is excluded (needs transparency). - Popup visibility:
showPopupsetting controls the floating recording indicator. Backend uses_popup_visibleflag andon_popup_visibility_changed()callback. When hidden,resize_popup()is skipped to prevent re-showing.
- evdev hotkeys: Linux uses
evdevfor keyboard input instead ofkeyboardlibrary (Wayland-compatible) - Wayland clipboard: Uses
wl-copyfor clipboard,wtype/dotool/ydotoolfor paste keystroke, falls back to pyautogui via XWayland - Qt WebEngine flags:
QTWEBENGINE_ENABLE_LINUX_ACCESSIBILITY=0and Chromium flags (--use-gl=egl,--disable-gpu-sandbox) set inmain.pyto improve rendering performance - Hyprland rules:
_setup_hyprland_window_rules()inmain.pyconfigures popup as floating, pinned, no-focus viahyprctl - Multi-monitor:
get_active_monitor_info()re-detects cursor monitor on each recording start
Python tests use pytest and are in src-pyloid/tests/. Test files include:
test_logger.py- Logger infrastructure teststest_model_manager.py- Model download/cache management teststest_transcription.py- Transcription service tests (slow, downloads model on first run)test_audio.py,test_hotkey.py,test_clipboard.py,test_settings.py,test_app_controller.py
Uses shadcn/ui (New York style) with Tailwind CSS v4. Add components via:
npx shadcn@latest add <component>- Never add co-author lines to commit messages
- Keep commit messages concise and descriptive
- Use conventional commit prefixes:
fix:,feat:,update:,refactor:, etc. - Follow the release guide at
docs/plans/release-guide.mdfor creating releases