This video demonstrates a real-time multimodal web application where an AI assistant can see through your camera, hear your voice, and respond with natural speech.
You can show objects to the camera, ask questions out loud, and receive instant spoken responses based on what the AI sees and hears — creating a natural, low-latency conversation.
# Clone the repository
git clone https://github.com/Ashot72/gemini-live-vision-AI
# Navigate into the project directory
cd gemini-live-vision-AI
# Copy the example `.env` file and add your project ID
cp env.example .env
# Place your `service-account-key.json` file in the project root directory.
# Install dependencies
npm install
# Start the development server
npm start
# The app will be available at http://localhost:3000- Open the Run view (
View → RunorCtrl+Shift+D) to access the debug configuration
📺 Video: Watch on YouTube