Add Bookworm OS support and Grok Voice AI integration#47

Open

obikata wants to merge 28 commits intonikivanov:masterfrom

obikata:feature/bookworm-support

obikata commented Apr 4, 2026 •

edited

Loading

Summary

Bookworm OS対応: Raspberry Pi OS (Bookworm/Debian 12) での手動セットアップ手順とコード修正
Grok Voice Chat: xAI Realtime APIによるリアルタイム音声会話機能（Push-to-talk方式）
カメラ画像認識: Grokが会話の文脈から自律的にカメラ映像を取得・分析して応答
UI刷新: Glassmorphismデザインでモダンなインタフェースに

主な変更点

Bookworm OS対応

config.txt の場所変更 (/boot/firmware/)
python-smbus → python3-smbus、pip --break-system-packages
SSL context修正 (Python 3.13対応)
raspivid → rpicam-vid
mimic1 → espeak (TTS)
PowerPlant未接続時のクラッシュ防止
SETUP.md に完全なセットアップ手順を追加

Grok Voice Chat

xAI Realtime API経由のWebSocketプロキシ (/voiceChat)
Push-to-talk方式 (Gキーでセッション開始、スペースキーで録音)
ブラウザ側でマイク入力・音声再生 (Web Audio API)
起動時にWatneyが日本語で自己紹介

カメラ画像認識

Grokがfunction callingで自律的にカメラを使用
ブラウザのvideo要素からスナップショットを取得 (映像中断なし)
Grok Vision API (grok-4-1-fast-non-reasoning) で画像分析
分析結果を音声で自然に応答

UI

Glassmorphism (すりガラス風パネル)
角丸ボタン、スムーズなアニメーション
録音中のパルスアニメーション
音声チャットトランスクリプト表示

Test plan

Raspberry Pi 3A+ でクリーンインストール (SETUP.md の手順)
ブラウザから https://<IP>:5000 でアクセス確認
Gキーで音声チャット開始 → スペースキーで会話
視覚に関する質問でカメラ画像認識が動作

obikata and others added 28 commits

March 29, 2026 15:02


          Add Bookworm OS support and setup guide

3d282a5

- Handle missing PowerPlant (UPS) hardware gracefully on I2C
- Add SETUP.md with full manual setup steps for new Raspberry Pi OS
- Document all Buster→Bookworm compatibility changes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Update rover.conf to use espeak and update SETUP.md

f49ef6a

- Change TTSCommand from mimic1 to espeak in rover.conf
- Add branch checkout step to SETUP.md
- Minor doc fixes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Fix SSL context for Python 3.13 compatibility

1f47a64

Use ssl.SSLContext(PROTOCOL_TLS_SERVER) instead of
ssl.create_default_context() which creates a client context
and breaks TLS on Python 3.13.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Replace raspivid with libcamera-vid for Bookworm compatibility

6fba568

raspivid is deprecated in Bookworm OS. Use libcamera-vid instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Update SETUP.md with video, SSL, and PowerPlant changes

66852c3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Fix ethernet config to not override default route

231f326

Use ipv4.never-default to prevent eth0 from stealing the
default route from WiFi, which breaks internet access.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Replace libcamera-vid with rpicam-vid for latest Bookworm

b3c8e51

Command renamed from libcamera-vid to rpicam-vid in newer rpicam-apps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Add Grok Voice chat integration via xAI Realtime API

53b9416

- Add WebSocket proxy endpoint /voiceChat in server.py
- Add [XAI] config section to rover.conf
- Add voice_chat.js for mic capture, audio playback, and transcript
- Add voice chat button and transcript UI to index.html
- Add CSS styles for voice chat UI
- Keyboard shortcut: G key to toggle voice chat

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Add detailed logging to voice chat proxy for debugging

44699a0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Fix xAI event names for audio and transcript

4a3c17e

xAI uses 'response.output_audio.delta' and
'response.output_audio_transcript.delta' instead of
'response.audio.delta' and 'response.audio_transcript.delta'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Add session config for xAI: VAD, audio format, transcription

a31d3b6

- Enable server-side voice activity detection (server_vad)
- Set pcm16 input/output audio format
- Enable input audio transcription
- Set modalities to text and audio

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Tune VAD settings for better voice detection

Lower threshold and add silence/padding duration for
more reliable voice activity detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Fix audio playback stalling after multiple exchanges

d1761c1

- Reset playback time when it falls behind current time
- Reset playback time on new response to prevent queue buildup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Mute mic while AI is speaking to prevent echo and false triggers

43d21e1

- Stop sending audio to xAI while AI response is playing
- Resume mic 500ms after response completes
- Prevents "うん" loop from echo/noise detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Clear audio buffer after AI response to reset VAD state

c7bd5c4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Switch voice chat to push-to-talk mode

48eb033

- Space key: hold to record, release to send
- G key: toggle voice chat session on/off
- Disable server VAD, use manual commit instead
- Red button indicator while recording
- Simpler and more reliable than VAD-based detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Add mic gain boost and prevent empty audio submissions

a06a6df

- Add 3x gain node to amplify mic input
- Track if actual audio was captured during recording
- Only commit and request response if audio was detected
- Clear buffer on release if no audio detected

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Add greeting on voice chat start, boost mic gain, default to Japanese

d698c39

- Auto-greet with self-introduction when voice chat starts
- Increase mic gain from 3x to 8x for better pickup
- Set instructions to always respond in Japanese

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Set voice to Rex for Watney's character

8191e04

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Modernize UI with glassmorphism design

9dcb6b1

- Frosted glass panels with backdrop-filter blur
- Rounded corners and smooth transitions
- Flexbox layout for button panel
- Modern range slider and input styling
- Pulse animation on voice recording
- Inter font for cleaner typography
- Custom scrollbar for voice chat transcript

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Fix transcript: separate each AI response into its own line

98f14d6

Reset current AI message div after each response completes,
so subsequent responses start on a new line instead of
appending to the same one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Reset AI message div on response.created for reliable separation

0ae7b6f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Add autonomous camera vision via browser snapshot + Grok Vision

98a8d39

- Grok decides when to look through camera via function calling
- Browser captures frame from video element (no camera conflict)
- Server sends image to Grok Vision API for analysis
- Result is fed back to voice conversation naturally
- No video stream interruption

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Add /testVision endpoint for debugging Vision API

469cbc8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Fix vision model name to grok-2-vision-1212

6740d8d

grok-2-vision-latest does not exist in the xAI API.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Fix vision model to grok-4-1-fast-non-reasoning

db0fbd3

grok-2-vision models have been retired. Current xAI API uses
grok-4 series models which natively support image input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Remove /testVision debug endpoint

76c8e71

No longer needed after Vision API integration confirmed working.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


          Add Grok Voice Chat setup instructions to SETUP.md

eebce15

- API key setup steps
- Permission requirements (Realtime + Chat)
- Usage guide (G key, Space for push-to-talk)
- Voice configuration options
- Security note about not committing API keys

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet