CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

VoiceFlow is a cross-platform voice-to-text paste utility built with Pyloid (Python desktop framework using PySide6/Qt WebEngine) and React. Users hold a hotkey to record audio, release to transcribe using faster-whisper, and the text is automatically pasted at the cursor. Supports Windows, Linux (Wayland/X11), and macOS.

Commands

# Initial setup (installs both Node and Python dependencies)
pnpm run setup

# Development mode (runs Vite frontend + Pyloid backend concurrently)
pnpm run dev

# Development with hot-reload for Python changes
pnpm run dev:watch

# Build desktop application
pnpm run build

# Build platform installers (run on each target OS)
pnpm run build:installer          # Windows (.exe via Inno Setup)
pnpm run build:installer:linux    # Linux (.tar.gz + .AppImage)
pnpm run build:installer:macos    # macOS (.dmg)

# Run Python tests
cd VoiceFlow && uv run -p .venv pytest src-pyloid/tests/

# Run single test file
uv run -p .venv pytest src-pyloid/tests/test_transcription.py -v

# Run frontend only (for UI development)
pnpm run vite

# Lint frontend
pnpm run lint

Architecture

Backend (src-pyloid/)

Python backend using Pyloid framework with PySide6:

main.py - Application entry point. Creates Pyloid app, tray icon, main dashboard window, and recording popup window. Sets up UI callbacks connecting backend events to popup state changes.
server.py - RPC server using PyloidRPC. Exposes methods (get_settings, update_settings, get_history, etc.) that frontend calls via pyloid-js RPC.
app_controller.py - Singleton controller orchestrating all services. Handles hotkey activate/deactivate flow: start recording -> stop recording -> transcribe -> paste at cursor -> save to history.

Services (src-pyloid/services/):

audio.py - Microphone recording using sounddevice, streams amplitude for visualizer
transcription.py - faster-whisper model loading and transcription
hotkey.py - Global hotkey listener using keyboard library
clipboard.py - Clipboard operations and paste-at-cursor using pyautogui
settings.py - Settings management with defaults
database.py - SQLite database for settings and history (stored at ~/.VoiceFlow/VoiceFlow.db)
logger.py - Domain-based logging with hybrid format [timestamp] [LEVEL] [domain] message | {json}. Supports domains: model, audio, hotkey, settings, database, clipboard, window. Configured with 100MB log rotation.
model_manager.py - Whisper model download/cache management using huggingface_hub. Provides download progress tracking (percent, speed, ETA), cancellation via CancelToken, daemon thread execution, and clear_cache() to delete only VoiceFlow's faster-whisper models.

Frontend (src/)

React 18 + TypeScript + Vite frontend:

App.tsx - Hash-based routing between /popup, /onboarding, and /dashboard. Checks model cache on startup and shows recovery modal if model is missing.
lib/api.ts - RPC wrapper using pyloid-js to call Python backend methods. Includes model management APIs (getModelInfo, startModelDownload, cancelModelDownload).
lib/types.ts - TypeScript interfaces for Settings, HistoryEntry, Stats, Options, ModelInfo, DownloadProgress
pages/ - Popup (recording indicator), Onboarding (includes model download step), Dashboard
components/ - Feature components plus shadcn/ui components in components/ui/
- ModelDownloadProgress.tsx - Download progress UI with progress bar, speed, ETA, and retry support
- ModelDownloadModal.tsx - Dialog wrapper for model downloads triggered from settings
- ModelRecoveryModal.tsx - Startup modal for missing model recovery

Frontend-Backend Communication

The frontend uses pyloid-js RPC to call Python methods:

import { rpc } from "pyloid-js";
const settings = await rpc.call("get_settings");

Backend sends events to popup window via:

popup_window.invoke('popup-state', {'state': 'recording'})

Recording Flow

User holds hotkey (configurable, default Ctrl+Win)
HotkeyService.on_activate -> AppController._handle_hotkey_activate -> AudioService.start_recording
Popup transitions to "recording" state, shows amplitude visualizer
User releases hotkey
AudioService.stop_recording returns audio numpy array
TranscriptionService.transcribe runs faster-whisper
ClipboardService.paste_at_cursor pastes text
History saved to database
Popup returns to "idle" state

Qt Threading Pattern

The keyboard library runs hotkey callbacks in a separate thread, but Qt requires UI operations on the main thread. The solution uses Qt signals/slots:

ThreadSafeSignals class in main.py defines signals (recording_started, recording_stopped, etc.)
Callback functions from AppController emit signals instead of directly updating UI
Signals connect to slot functions with Qt.QueuedConnection to ensure they run on the main thread
Slot functions safely update popup state and window properties

Popup Window Transparency

For transparent popup windows on Windows:

Set transparent=True when creating the window
Call qwindow.setAttribute(Qt.WA_TranslucentBackground, True) on the Qt window
Set webview.page().setBackgroundColor(QColor(0, 0, 0, 0)) before loading URL
Re-apply WA_TranslucentBackground after any setWindowFlags() call

Model Download Flow

App/Onboarding/Settings triggers model download via startModelDownload(modelName)
ModelManager creates daemon thread, starts huggingface_hub.snapshot_download()
Custom tqdm class captures progress, sends updates via callback (throttled to 10/sec)
Frontend receives download-progress events with percent, speed, ETA
User can cancel via cancelModelDownload() which sets CancelToken
On completion, model is cached in huggingface cache directory
Turbo model uses mobiuslabsgmbh/faster-whisper-large-v3-turbo (same as faster-whisper internal mapping)

Key Patterns

Singleton controller: get_controller() returns singleton AppController instance
UI callbacks: Backend notifies frontend of state changes via callbacks set in set_ui_callbacks()
Thread-safe signals: Qt signals with QueuedConnection marshal UI updates from background threads to main thread
Background threads: Model loading, downloads, and transcription run in daemon threads
Domain logging: All services use get_logger(domain) for structured logging with domains like model, audio, hotkey, etc.
Custom hotkeys: Supports modifier-only combos (e.g., Ctrl+Win) and standard combos (e.g., Ctrl+R). Frontend captures keys, backend validates and registers.
Path alias: Frontend uses @/ for src/ imports (configured in tsconfig.json and vite.config.ts)
Lazy pyautogui: ClipboardService lazy-loads pyautogui via _get_pyautogui() to avoid mouseinfo's sys.exit() when tkinter is missing in bundled builds
Reduced effects on Linux: App.tsx adds reduced-effects class on Linux to disable backdrop-filter, animated orbs, and noise texture that cause sluggish UI under Qt WebEngine software rendering. Popup route is excluded (needs transparency).
Popup visibility: showPopup setting controls the floating recording indicator. Backend uses _popup_visible flag and on_popup_visibility_changed() callback. When hidden, resize_popup() is skipped to prevent re-showing.

Linux-Specific

evdev hotkeys: Linux uses evdev for keyboard input instead of keyboard library (Wayland-compatible)
Wayland clipboard: Uses wl-copy for clipboard, wtype/dotool/ydotool for paste keystroke, falls back to pyautogui via XWayland
Qt WebEngine flags: QTWEBENGINE_ENABLE_LINUX_ACCESSIBILITY=0 and Chromium flags (--use-gl=egl, --disable-gpu-sandbox) set in main.py to improve rendering performance
Hyprland rules: _setup_hyprland_window_rules() in main.py configures popup as floating, pinned, no-focus via hyprctl
Multi-monitor: get_active_monitor_info() re-detects cursor monitor on each recording start

Testing

Python tests use pytest and are in src-pyloid/tests/. Test files include:

test_logger.py - Logger infrastructure tests
test_model_manager.py - Model download/cache management tests
test_transcription.py - Transcription service tests (slow, downloads model on first run)
test_audio.py, test_hotkey.py, test_clipboard.py, test_settings.py, test_app_controller.py

UI Components

Uses shadcn/ui (New York style) with Tailwind CSS v4. Add components via:

npx shadcn@latest add <component>

Git Commit Guidelines

Never add co-author lines to commit messages
Keep commit messages concise and descriptive
Use conventional commit prefixes: fix:, feat:, update:, refactor:, etc.
Follow the release guide at docs/plans/release-guide.md for creating releases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Commands

Architecture

Backend (src-pyloid/)

Frontend (src/)

Frontend-Backend Communication

Recording Flow

Qt Threading Pattern

Popup Window Transparency

Model Download Flow

Key Patterns

Linux-Specific

Testing

UI Components

Git Commit Guidelines

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Commands

Architecture

Backend (src-pyloid/)

Frontend (src/)

Frontend-Backend Communication

Recording Flow

Qt Threading Pattern

Popup Window Transparency

Model Download Flow

Key Patterns

Linux-Specific

Testing

UI Components

Git Commit Guidelines