Skip to content

feat: add HandoffMetrics for measuring agent handoff timing#5442

Closed
longcw wants to merge 4 commits intomainfrom
longc/handoff-metrics
Closed

feat: add HandoffMetrics for measuring agent handoff timing#5442
longcw wants to merge 4 commits intomainfrom
longc/handoff-metrics

Conversation

@longcw
Copy link
Copy Markdown
Contributor

@longcw longcw commented Apr 14, 2026

Summary

  • Add HandoffMetrics to measure agent handoff cost, emitted via metrics_collected and attached to AgentHandoff.metrics
  • Tracks orchestration timing: drain_duration, new_activity_duration, on_enter_duration, and total duration
  • Reports whether STT pipeline and realtime session were reused across the handoff (stt_reused, realtime_session_reused)

Limitation

HandoffMetrics does not include the WebSocket connection time for new realtime sessions or STT streams. When resources are not reused, the connection is established asynchronously in a background task. This connection time is already reported separately via RealtimeModelMetrics.acquire_time and STTMetrics.acquire_time through the existing metrics_collected pipeline.

Add HandoffMetrics to measure the cost of agent handoffs, including:
- drain_duration: time to drain the old activity (on_exit + pending speech)
- new_activity_duration: time to start/resume the new activity
- on_enter_duration: time for the new agent's on_enter callback
- stt_reused / realtime_session_reused: whether resources were reused
@chenghao-mou chenghao-mou requested a review from a team April 14, 2026 05:46
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

βœ… Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

realtime_session_reused: bool = False
"""Whether the realtime session was reused from the previous agent."""
old_agent_id: str | None = None
new_agent_id: str | None = None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need these two if we have them in AgentHandoff already?

new_agent_id=handoff_item.new_agent_id,
)
handoff_item.metrics = handoff_metrics
self.emit("metrics_collected", MetricsCollectedEvent(metrics=handoff_metrics))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it at the session-level makes sense, but we did just deprecate the session-level metrics_collected event. Should we put it somewhere else?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's an issue, I think one option is to put it in AgentHandoff and then emit via self.emit("conversation_item_added", ConversationItemAddedEvent(item=handoff_item)), but the metrics come after the handoff emitted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delay emitting "conversation_item_added" until the handoff metrics are available?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may cause the handoff event emit after the reply from on_enter?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Semantically, handoff happens before any reply in on_enter. Then maybe we should exclude new activity start/resume and on_enter from handoff metrics. Instead, we can have more generic Agent/AgentTask-level metrics measuring those.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree, we shouldn't include on_enter in the handoff metrics.

@longcw
Copy link
Copy Markdown
Contributor Author

longcw commented Apr 24, 2026

couldn't find a way to expose the STT connection time from the stt_node

@longcw longcw closed this Apr 24, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +1281 to +1285
handoff_start = time.monotonic()
drain_duration = 0.0
new_activity_duration = 0.0
stt_reused = False
rt_session_reused = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟑 HandoffMetrics duration includes lock acquisition wait time, contradicting its docstring

handoff_start is captured at agent_session.py:1281, before async with self._activity_lock: at line 1287, but total_duration is computed at agent_session.py:1372 inside the lock. If _activity_lock is contended (e.g., an AgentTask handoff overlaps with an update_agent call), duration silently includes the time spent waiting on the lock. This contradicts the HandoffMetrics.duration docstring (livekit-agents/livekit/agents/metrics/base.py:189) which states it measures "drain + close/pause + start/resume" β€” none of those phases include lock contention. The sub-fields drain_duration and new_activity_duration are measured correctly inside the lock, so duration will always exceed their sum by an unaccounted gap.

Suggested change
handoff_start = time.monotonic()
drain_duration = 0.0
new_activity_duration = 0.0
stt_reused = False
rt_session_reused = False
stt_reused = False
rt_session_reused = False
Open in Devin Review

Was this helpful? React with πŸ‘ or πŸ‘Ž to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants