ElaRech is an intelligent, voice-interactive multimodal Virtual Learning assistant that helps students explore and understand visual academic content like diagrams, charts, handwritten notes, or Virtual Learning papers. Just speak your query, upload an image, and ElaRech will answer both visually and audibly.
- Multimodal Model: meta-llama/llama-4-scout-17b-16e-instruct via Groq API
- Voice Recognition: Whisper
- TTS Engines: Google gTTS & ElevenLabs
Groq API – Ultra-fast LLM API LLaMA-4 Vision Model – meta-llama/llama-4-scout-17b-16e-instruct Whisper – For speech recognition gTTS & ElevenLabs – For voice output Gradio – For building the web interface
- 🎙️ Voice Input: Speak your Virtual Learning question naturally.
- 🧠 Multimodal AI: Combines your voice query with an uploaded image to give smart, context-aware answers.
- 🖼️ Image Understanding: Upload diagrams, charts, handwritten pages, or screenshots — ElaRech understands them.
- 💬 LLM-Powered Responses: Powered by
meta-llama/llama-4-scout-17b-16e-instructvia Groq API. - 🔊 Dual TTS Engines: Replies are spoken aloud using both gTTS and ElevenLabs.
- 🌐 Gradio Web Interface: Clean, easy-to-use interface accessible from your browser.
ElaRech/ ├── gradio_app.py # Main app with Gradio interface ├── brain_of_the_elaRech.py # Core logic for image + query processing ├── .env ├── voice_of_Virtual Learninger.py └── voice_of_user.py
git clone https://github.com/iamafridi/elaRech.git
cd elaRechGROQ_API_KEY=your_groq_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key
python gradio_app.py
Upload a diagram and ask:
🗣️ "Explain this process in simple terms."
📢 ElaRech will generate a voice and text response explaining the diagram based on your question.
📜 License MIT License
Afridi Akbar Ifty GitHub: https://github.com/iamafridi Portfolio : https://iamafrididev.netlify.app LinkedIn: your-linkedin-profile
