Skip to content

Dageus/engagement-evaluator

Repository files navigation

Maintaining Engagement in Human Conversations

SO-REAL Project Group 13 - Social Robotics and Human-Robot Interaction

Project Context

  • Goal: test how an adaptive engagement speech strategy affects human-robot interaction using the Elmo robot as the embodiment and an LLM-powered dialogue stack.

  • Setting: controlled study where Elmo opens with the same topic prompt; engagement tactics (independent variable) are varied between runs.

  • Measurement: user engagement time and qualitative behavior are observed while VAD/STT/LLM/TTS pipeline runs in real-time.

  • Constraints: low latency from VAD to TTS; reproducible runs with logging suitable for later analysis.

Conversation Workflow

flowchart LR
    A[User] -->|Speech Utterance| B(VAD & Audio Buffer)
    B -->|Speech Audio Chunk| C(Faster-Whisper ASR)
    C -->|Transcribed Text| D(Engagement Scoring LLM)
    D -->|Engagement Score 1-5| E(Dialogue Generation LLM)
    E -->|Generated Response Text| F(Text-to-Speech Model)
    F -->|Generated Audio Response| A
Loading

Real-time Processing

stateDiagram
    direction LR
    [*] --> SileroVAD: ~32ms audio segments
    state vad_choice <<choice>>
    SileroVAD --> vad_choice: Detected speech?
    vad_choice --> SileroVAD: No
    vad_choice --> FasterWhisper: Yes
    FasterWhisper -->EngagementLLM: Transcribed Text
    EngagementLLM -->ResponseLLM:Score 1-5 + Text
    ResponseLLM --> [*] : Generated Response
Loading

Disengagement Loop

stateDiagram
    direction LR
    State_High_Engagement --> State_Low_Engagement : Short answers / Long pauses
    State_Low_Engagement --> Repair_Strategy : Trigger Threshold (<40%)
    state Repair_Strategy {
        Explicit_Check --> User_Response
        Topic_Switch --> User_Response
    }
    User_Response --> State_High_Engagement : User elaborates
    User_Response --> End_Interaction : User ignores
Loading

Study Variables

Static Variables

  • ELMO starts the conversation with the same question/topic

Independent Variables

  • Repair Strategy

Dependent Variables

  • Survival Rate (how much time the conversation holds)

Measurements

  • Response Latency

  • Synctatic Complexity (Short-term responses)

  • Subjective Scores (RoSAS, Godspeed)

  • Logs from the LLM thought process

Engagement Strategy Notes

  • Engagement score (1-5) is produced per user turn to gauge attention/interest; score plus rationale feed the response LLM.
  • Strategies to vary: mirroring affect, asking clarifying questions, adding light humor, summarizing/reframing, prompting elaboration, or keeping neutral baseline.
  • Topic management: keep the initial conversation topic fixed; track state (last turns, score trend) to adapt responses without drifting topics.

Development

The source code for the project is located inside the SRProject directory.

Requirements

Building / Running

To install the dependencies locally run:

poetry install

To run the project:

poetry run python src/group-13/session.py

Be sure to populate the following environment variables (.env):

  • ENGAGEMENT_API_KEY: Your Gemini API key for the Engagement Evaluator model

  • DIALOGUE_API_KEY: Your Gemini API key for the Dialogue model

  • ENVIRONMENT: can be either ENGAGEMENT or CONTROL, depending on the hypothesis being tested (defaults to ENGAGEMENT)

  • ROBOT_IP: Your robot's IP, if absent code runs in dry mode (without a robot)

Repository Layout

  • src/group-13/vad.py: VAD prototype using Silero.
  • src/group-13/stt.py: STT prototype using Faster-Whisper.
  • src/group-13/main.py: entry point; will orchestrate audio capture, VAD, STT, engagement scoring, response generation, TTS, and robot output.
  • TODO.md: high-level project to-do list (setup, pipeline integration, study logistics).
  • assets/: supporting artifacts (e.g., prompts, audio) — fill as needed.

Environment / Keys

  • LLM (Gemini) requires API keys; load from env vars or a local .env/config.toml (not checked in). TTS uses gTTS locally (no Gemini TTS). When no Gemini key/library is present, the pipeline falls back to heuristics for engagement/response.
  • Audio I/O: choose correct input/output devices; ensure sample rate compatible with VAD/STT pipeline.
  • Robot I/O: Elmo integration will need network/serial commands from https://github.com/S-Andrade/Robots/tree/main/Elmo; specify connection details in a config block.

Study Operations Checklist (high level)

  • Prepare consent and briefing scripts; keep conversation topic constant.
  • Validate audio path end-to-end (mic → VAD → STT → LLMs → TTS → robot speaker) before each session; heuristic fallback runs when Gemini keys are absent.
  • Log timestamps, engagement scores, selected strategy, and user responses for analysis (CSV/JSON).
  • Provide a fallback "no-robot" mode for laptop-only dry runs and demos.

Technologies Used

VAD

STT

VAD + STT

LLM

TTS

Conversation Turn Limit

  • maximum 10 Turns
Variable,Metric (Operationalization),Supporting Citation
IV: Repair Strategy,"Control: No deviation from script.Explicit: ""Meta-comment"" on silence (e.g., ""Are you still there?"").Implicit: ""Topic Switch"" or ""Humor"" injection.",Fischer et al. (2019) vs. Niculescu et al. (2013)
DV: Survival Rate,"Time (seconds) until user says ""Stop"" or leaves.",Sidner et al. (2005)
DV: Response Latency,Avg. time (ms) between Robot_End and User_Start.,Skantze (2021)
DV: Verbal Density,"Avg. words per turn (excluding ""stop words"").",Ben-Youssef et al. (2017)

About

Small experiment using LLM's to measure a user's engagement in conversation

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors