Maintaining Engagement in Human Conversations

SO-REAL Project Group 13 - Social Robotics and Human-Robot Interaction

Project Context

Goal: test how an adaptive engagement speech strategy affects human-robot interaction using the Elmo robot as the embodiment and an LLM-powered dialogue stack.
Setting: controlled study where Elmo opens with the same topic prompt; engagement tactics (independent variable) are varied between runs.
Measurement: user engagement time and qualitative behavior are observed while VAD/STT/LLM/TTS pipeline runs in real-time.
Constraints: low latency from VAD to TTS; reproducible runs with logging suitable for later analysis.

Conversation Workflow

flowchart LR
    A[User] -->|Speech Utterance| B(VAD & Audio Buffer)
    B -->|Speech Audio Chunk| C(Faster-Whisper ASR)
    C -->|Transcribed Text| D(Engagement Scoring LLM)
    D -->|Engagement Score 1-5| E(Dialogue Generation LLM)
    E -->|Generated Response Text| F(Text-to-Speech Model)
    F -->|Generated Audio Response| A

Real-time Processing

stateDiagram
    direction LR
    [*] --> SileroVAD: ~32ms audio segments
    state vad_choice <<choice>>
    SileroVAD --> vad_choice: Detected speech?
    vad_choice --> SileroVAD: No
    vad_choice --> FasterWhisper: Yes
    FasterWhisper -->EngagementLLM: Transcribed Text
    EngagementLLM -->ResponseLLM:Score 1-5 + Text
    ResponseLLM --> [*] : Generated Response

Disengagement Loop

stateDiagram
    direction LR
    State_High_Engagement --> State_Low_Engagement : Short answers / Long pauses
    State_Low_Engagement --> Repair_Strategy : Trigger Threshold (<40%)
    state Repair_Strategy {
        Explicit_Check --> User_Response
        Topic_Switch --> User_Response
    }
    User_Response --> State_High_Engagement : User elaborates
    User_Response --> End_Interaction : User ignores

Study Variables

Static Variables

ELMO starts the conversation with the same question/topic

Independent Variables

Repair Strategy

Dependent Variables

Survival Rate (how much time the conversation holds)

Measurements

Response Latency
Synctatic Complexity (Short-term responses)
Subjective Scores (RoSAS, Godspeed)
Logs from the LLM thought process

Engagement Strategy Notes

Engagement score (1-5) is produced per user turn to gauge attention/interest; score plus rationale feed the response LLM.
Strategies to vary: mirroring affect, asking clarifying questions, adding light humor, summarizing/reframing, prompting elaboration, or keeping neutral baseline.
Topic management: keep the initial conversation topic fixed; track state (last turns, score trend) to adapt responses without drifting topics.

Development

The source code for the project is located inside the SRProject directory.

Requirements

just (optional)
poetry
pyenv (optional for version management)

Building / Running

To install the dependencies locally run:

poetry install

To run the project:

poetry run python src/group-13/session.py

Be sure to populate the following environment variables (.env):

ENGAGEMENT_API_KEY: Your Gemini API key for the Engagement Evaluator model
DIALOGUE_API_KEY: Your Gemini API key for the Dialogue model
ENVIRONMENT: can be either ENGAGEMENT or CONTROL, depending on the hypothesis being tested (defaults to ENGAGEMENT)
ROBOT_IP: Your robot's IP, if absent code runs in dry mode (without a robot)

Repository Layout

src/group-13/vad.py: VAD prototype using Silero.
src/group-13/stt.py: STT prototype using Faster-Whisper.
src/group-13/main.py: entry point; will orchestrate audio capture, VAD, STT, engagement scoring, response generation, TTS, and robot output.
TODO.md: high-level project to-do list (setup, pipeline integration, study logistics).
assets/: supporting artifacts (e.g., prompts, audio) — fill as needed.

Environment / Keys

LLM (Gemini) requires API keys; load from env vars or a local .env/config.toml (not checked in). TTS uses gTTS locally (no Gemini TTS). When no Gemini key/library is present, the pipeline falls back to heuristics for engagement/response.
Audio I/O: choose correct input/output devices; ensure sample rate compatible with VAD/STT pipeline.
Robot I/O: Elmo integration will need network/serial commands from https://github.com/S-Andrade/Robots/tree/main/Elmo; specify connection details in a config block.

Study Operations Checklist (high level)

Prepare consent and briefing scripts; keep conversation topic constant.
Validate audio path end-to-end (mic → VAD → STT → LLMs → TTS → robot speaker) before each session; heuristic fallback runs when Gemini keys are absent.
Log timestamps, engagement scores, selected strategy, and user responses for analysis (CSV/JSON).
Provide a fallback "no-robot" mode for laptop-only dry runs and demos.

Technologies Used

VAD

STT

VAD + STT

LLM

Google AI Studio - Gemini 2.0 Flash

TTS

gTTS
coquiai/TTS (optional fallback)

Conversation Turn Limit

maximum 10 Turns

Variable,Metric (Operationalization),Supporting Citation
IV: Repair Strategy,"Control: No deviation from script.Explicit: ""Meta-comment"" on silence (e.g., ""Are you still there?"").Implicit: ""Topic Switch"" or ""Humor"" injection.",Fischer et al. (2019) vs. Niculescu et al. (2013)
DV: Survival Rate,"Time (seconds) until user says ""Stop"" or leaves.",Sidner et al. (2005)
DV: Response Latency,Avg. time (ms) between Robot_End and User_Start.,Skantze (2021)
DV: Verbal Density,"Avg. words per turn (excluding ""stop words"").",Ben-Youssef et al. (2017)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Consent_Forms		Consent_Forms
SystemPrompts		SystemPrompts
assets		assets
logs		logs
src/group-13		src/group-13
.example_env		.example_env
.gitignore		.gitignore
README.md		README.md
final_report.pdf		final_report.pdf
justfile		justfile
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maintaining Engagement in Human Conversations

SO-REAL Project Group 13 - Social Robotics and Human-Robot Interaction

Project Context

Conversation Workflow

Real-time Processing

Disengagement Loop

Study Variables

Static Variables

Independent Variables

Dependent Variables

Measurements

Engagement Strategy Notes

Development

Requirements

Building / Running

Repository Layout

Environment / Keys

Study Operations Checklist (high level)

Technologies Used

VAD

STT

VAD + STT

LLM

TTS

Conversation Turn Limit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Maintaining Engagement in Human Conversations

SO-REAL Project Group 13 - Social Robotics and Human-Robot Interaction

Project Context

Conversation Workflow

Real-time Processing

Disengagement Loop

Study Variables

Static Variables

Independent Variables

Dependent Variables

Measurements

Engagement Strategy Notes

Development

Requirements

Building / Running

Repository Layout

Environment / Keys

Study Operations Checklist (high level)

Technologies Used

VAD

STT

VAD + STT

LLM

TTS

Conversation Turn Limit

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages