🤖 An open-source AI V-Tuber and desktop agent that combines speech processing, LLMs, and tool automation. Features include:
- 🎙️ Real-time STT/TTS with Groq & 11Labs
- 🐦 Twitter scheduling & automation
- 🧠 GraphRAG memory system
- 🔧 Extensible tool framework
- 🎮 VTube Studio integration
- 💬 YouTube chat interaction
- 💸 ElizaOS Twitter Client Integration
Built with Python, TypeScript, and modern AI services. Perfect for V-Tubing or as a powerful desktop assistant.
This project uses a custom fork of the ElizaOS Twitter Client that we've enhanced with an API server layer. Our version (agent-twitter-client) provides a REST API interface for the RinAI agent stack to schedule and manage tweets without requiring Twitter API keys.
Key Features:
- Multimodal AI: Integrates speech-to-text, text-to-speech, large language models, and tool calling for rich and interactive conversations.
- Live Streaming Ready: Designed for V-Tubing! Operate fully autonomously, engaging directly with chat or with a live host using speech-to-text.
- Desktop Agent: Operate fully autonomously, engaging directly with chat or with a live host using speech-to-text.
- Ultra-Fast Speech Processing: Utilizes Groq for Whisper AI, delivering lightning-fast and reliable speech-to-text transcription.
- Tool-Calling Powerhouse: Equipped with tools including:
- Twitter Agent: Create and schedule tweets
- Task Scheduling Agent: Schedule tweet posting and other background tasks
- Web Queries with Reasoning: Leverage Perplexity's DeepSeek R1 API for web queries
- Cryptocurrency Price & Analytics: Obtain live and historical crypto price data
- Time & Date Conversion: Get current time in any location or convert times between different timezones
- Weather Updates: Get current weather conditions and forecasts for any location
- Advanced Chat Agent: Based on the Rin AI Chat Agentic Chat Stack:
- GraphRAG Memory: Graph-based memory for context-aware responses
- Keyword-Based Intent Recognition: Fast keyword extraction for memory relevance
- Advanced Context Summarization: Active summarization for maintaining conversation context
- Smart LLM Gateway: Dynamically selects optimal LLM based on task complexity
- Streaming Architecture: End-to-end streaming for minimal latency
- Open Source & Extensible: Built to be customizable with community contributions welcome
Tech Stack:
- Backend: Python, Node.js/TypeScript
- LLMs: Role-Playing LLM, Claude 3.5
- Speech Processing: Groq Whisper AI (STT), 11Labs (TTS)
- Frontend: [Vtube Studio, OBS]
- Audio: FFmpeg, VoiceMeeter Banana (Windows)
Getting Started (Windows):
-
System Requirements:
- Windows 10/11
- VoiceMeeter Banana
- VTube Studio
- OBS Studio
- FFmpeg (Add to PATH)
-
Development Prerequisites:
-
API Keys Required:
- Groq (Speech-to-Text)
- 11Labs (Text-to-Speech)
- Perplexity (Web Queries)
-
Installation:
# Clone main repository git clone [rinai-multimodal-vtuber](https://github.com/dleerdefi/rinai-multimodal-vtuber) cd rinai-multimodal-vtuber # Setup Python environment python -m venv venv source venv/bin/activate # On Windows: .\venv\Scripts\activate pip install -r requirements.txt # Setup Twitter API Client git clone [agent-twitter-client](https://github.com/dleerdefi/agent-twitter-client) cd twitter-client npm install
-
Starting the Services:
a. Start the Twitter API Server:
cd twitter-client npx ts-node server.ts # Or the correct server startup file
- Verify the server is running at http://localhost:3000
b. Start the Main RinAI Server:
cd rinai-multimodal-vtuber python src/scripts/run_stream.pyFollow the prompts to:
- Choose between streaming or local agent mode
- Select your microphone device
- Enable/disable YouTube chat (if streaming)
c. Access the Web Interface:
- Open your browser to http://localhost:8765
- You should see the retro-style chat interface
- Messages will appear as they're processed
d. Available Hotkeys:
Alt+S: Toggle speech inputAlt+P: Pause/Resume all servicesAlt+Q: Quit
Each service needs to run in its own terminal window. Make sure MongoDB and Neo4j are running before starting the services.
Open Source and Contributions:
We welcome contributions! To get started:
- Fork this repository
- Create a new branch for your feature/fix
- Submit a Pull Request
License:
MIT License - see LICENSE file for details

