Add Bookworm OS support and Grok Voice AI integration#47
Open
obikata wants to merge 28 commits intonikivanov:masterfrom
Open
Add Bookworm OS support and Grok Voice AI integration#47obikata wants to merge 28 commits intonikivanov:masterfrom
obikata wants to merge 28 commits intonikivanov:masterfrom
Conversation
- Handle missing PowerPlant (UPS) hardware gracefully on I2C - Add SETUP.md with full manual setup steps for new Raspberry Pi OS - Document all Buster→Bookworm compatibility changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change TTSCommand from mimic1 to espeak in rover.conf - Add branch checkout step to SETUP.md - Minor doc fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use ssl.SSLContext(PROTOCOL_TLS_SERVER) instead of ssl.create_default_context() which creates a client context and breaks TLS on Python 3.13. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
raspivid is deprecated in Bookworm OS. Use libcamera-vid instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use ipv4.never-default to prevent eth0 from stealing the default route from WiFi, which breaks internet access. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Command renamed from libcamera-vid to rpicam-vid in newer rpicam-apps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add WebSocket proxy endpoint /voiceChat in server.py - Add [XAI] config section to rover.conf - Add voice_chat.js for mic capture, audio playback, and transcript - Add voice chat button and transcript UI to index.html - Add CSS styles for voice chat UI - Keyboard shortcut: G key to toggle voice chat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
xAI uses 'response.output_audio.delta' and 'response.output_audio_transcript.delta' instead of 'response.audio.delta' and 'response.audio_transcript.delta'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Enable server-side voice activity detection (server_vad) - Set pcm16 input/output audio format - Enable input audio transcription - Set modalities to text and audio Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lower threshold and add silence/padding duration for more reliable voice activity detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reset playback time when it falls behind current time - Reset playback time on new response to prevent queue buildup Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Stop sending audio to xAI while AI response is playing - Resume mic 500ms after response completes - Prevents "うん" loop from echo/noise detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Space key: hold to record, release to send - G key: toggle voice chat session on/off - Disable server VAD, use manual commit instead - Red button indicator while recording - Simpler and more reliable than VAD-based detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 3x gain node to amplify mic input - Track if actual audio was captured during recording - Only commit and request response if audio was detected - Clear buffer on release if no audio detected Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Auto-greet with self-introduction when voice chat starts - Increase mic gain from 3x to 8x for better pickup - Set instructions to always respond in Japanese Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Frosted glass panels with backdrop-filter blur - Rounded corners and smooth transitions - Flexbox layout for button panel - Modern range slider and input styling - Pulse animation on voice recording - Inter font for cleaner typography - Custom scrollbar for voice chat transcript Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reset current AI message div after each response completes, so subsequent responses start on a new line instead of appending to the same one. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Grok decides when to look through camera via function calling - Browser captures frame from video element (no camera conflict) - Server sends image to Grok Vision API for analysis - Result is fed back to voice conversation naturally - No video stream interruption Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
grok-2-vision-latest does not exist in the xAI API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
grok-2-vision models have been retired. Current xAI API uses grok-4 series models which natively support image input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No longer needed after Vision API integration confirmed working. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- API key setup steps - Permission requirements (Realtime + Chat) - Usage guide (G key, Space for push-to-talk) - Voice configuration options - Security note about not committing API keys Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
主な変更点
Bookworm OS対応
config.txtの場所変更 (/boot/firmware/)python-smbus→python3-smbus、pip--break-system-packagesraspivid→rpicam-vidmimic1→espeak(TTS)Grok Voice Chat
/voiceChat)カメラ画像認識
grok-4-1-fast-non-reasoning) で画像分析UI
Test plan
https://<IP>:5000でアクセス確認