RinAI Multimodal V-Tuber & Desktop Agent

🤖 An open-source AI V-Tuber and desktop agent that combines speech processing, LLMs, and tool automation. Features include:

🎙️ Real-time STT/TTS with Groq & 11Labs
🐦 Twitter scheduling & automation
🧠 GraphRAG memory system
🔧 Extensible tool framework
🎮 VTube Studio integration
💬 YouTube chat interaction
💸 ElizaOS Twitter Client Integration

Built with Python, TypeScript, and modern AI services. Perfect for V-Tubing or as a powerful desktop assistant.

Architecture Overview

ElizaOS Twitter Client Integration

This project uses a custom fork of the ElizaOS Twitter Client that we've enhanced with an API server layer. Our version (agent-twitter-client) provides a REST API interface for the RinAI agent stack to schedule and manage tweets without requiring Twitter API keys.

Key Features:

Multimodal AI: Integrates speech-to-text, text-to-speech, large language models, and tool calling for rich and interactive conversations.
Live Streaming Ready: Designed for V-Tubing! Operate fully autonomously, engaging directly with chat or with a live host using speech-to-text.
Desktop Agent: Operate fully autonomously, engaging directly with chat or with a live host using speech-to-text.
Ultra-Fast Speech Processing: Utilizes Groq for Whisper AI, delivering lightning-fast and reliable speech-to-text transcription.
Tool-Calling Powerhouse: Equipped with tools including:
- Twitter Agent: Create and schedule tweets
- Task Scheduling Agent: Schedule tweet posting and other background tasks
- Web Queries with Reasoning: Leverage Perplexity's DeepSeek R1 API for web queries
- Cryptocurrency Price & Analytics: Obtain live and historical crypto price data
- Time & Date Conversion: Get current time in any location or convert times between different timezones
- Weather Updates: Get current weather conditions and forecasts for any location
Advanced Chat Agent: Based on the Rin AI Chat Agentic Chat Stack:
- GraphRAG Memory: Graph-based memory for context-aware responses
- Keyword-Based Intent Recognition: Fast keyword extraction for memory relevance
- Advanced Context Summarization: Active summarization for maintaining conversation context
Smart LLM Gateway: Dynamically selects optimal LLM based on task complexity
Streaming Architecture: End-to-end streaming for minimal latency
Open Source & Extensible: Built to be customizable with community contributions welcome

Tech Stack:

Backend: Python, Node.js/TypeScript
LLMs: Role-Playing LLM, Claude 3.5
Speech Processing: Groq Whisper AI (STT), 11Labs (TTS)
Frontend: [Vtube Studio, OBS]
Audio: FFmpeg, VoiceMeeter Banana (Windows)

Getting Started (Windows):

System Requirements:
- Windows 10/11
- VoiceMeeter Banana
- VTube Studio
- OBS Studio
- FFmpeg (Add to PATH)
Development Prerequisites:
API Keys Required:
- Groq (Speech-to-Text)
- 11Labs (Text-to-Speech)
- Perplexity (Web Queries)

Installation:

# Clone main repository
git clone [rinai-multimodal-vtuber](https://github.com/dleerdefi/rinai-multimodal-vtuber)
cd rinai-multimodal-vtuber

# Setup Python environment
python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate
pip install -r requirements.txt

# Setup Twitter API Client
git clone [agent-twitter-client](https://github.com/dleerdefi/agent-twitter-client)
cd twitter-client
npm install

Starting the Services:

a. Start the Twitter API Server:
```
cd twitter-client
npx ts-node server.ts  # Or the correct server startup file
```
- Verify the server is running at http://localhost:3000
b. Start the Main RinAI Server:
```
cd rinai-multimodal-vtuber
python src/scripts/run_stream.py
```
Follow the prompts to:
- Choose between streaming or local agent mode
- Select your microphone device
- Enable/disable YouTube chat (if streaming)
c. Access the Web Interface:
- Open your browser to http://localhost:8765
- You should see the retro-style chat interface
- Messages will appear as they're processed
d. Available Hotkeys:
- Alt+S: Toggle speech input
- Alt+P: Pause/Resume all services
- Alt+Q: Quit

Each service needs to run in its own terminal window. Make sure MongoDB and Neo4j are running before starting the services.

Open Source and Contributions:

We welcome contributions! To get started:

Fork this repository
Create a new branch for your feature/fix
Submit a Pull Request

License:

MIT License - see LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets/images		assets/images
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RinAI Multimodal V-Tuber & Desktop Agent

Architecture Overview

ElizaOS Twitter Client Integration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RinAI Multimodal V-Tuber & Desktop Agent

Architecture Overview

ElizaOS Twitter Client Integration

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages