feat: add HandoffMetrics for measuring agent handoff timing#5442
feat: add HandoffMetrics for measuring agent handoff timing#5442
Conversation
Add HandoffMetrics to measure the cost of agent handoffs, including: - drain_duration: time to drain the old activity (on_exit + pending speech) - new_activity_duration: time to start/resume the new activity - on_enter_duration: time for the new agent's on_enter callback - stt_reused / realtime_session_reused: whether resources were reused
| realtime_session_reused: bool = False | ||
| """Whether the realtime session was reused from the previous agent.""" | ||
| old_agent_id: str | None = None | ||
| new_agent_id: str | None = None |
There was a problem hiding this comment.
nit: Do we need these two if we have them in AgentHandoff already?
| new_agent_id=handoff_item.new_agent_id, | ||
| ) | ||
| handoff_item.metrics = handoff_metrics | ||
| self.emit("metrics_collected", MetricsCollectedEvent(metrics=handoff_metrics)) |
There was a problem hiding this comment.
Having it at the session-level makes sense, but we did just deprecate the session-level metrics_collected event. Should we put it somewhere else?
There was a problem hiding this comment.
yeah that's an issue, I think one option is to put it in AgentHandoff and then emit via self.emit("conversation_item_added", ConversationItemAddedEvent(item=handoff_item)), but the metrics come after the handoff emitted.
There was a problem hiding this comment.
Should we delay emitting "conversation_item_added" until the handoff metrics are available?
There was a problem hiding this comment.
this may cause the handoff event emit after the reply from on_enter?
There was a problem hiding this comment.
I see. Semantically, handoff happens before any reply in on_enter. Then maybe we should exclude new activity start/resume and on_enter from handoff metrics. Instead, we can have more generic Agent/AgentTask-level metrics measuring those.
There was a problem hiding this comment.
yeah I agree, we shouldn't include on_enter in the handoff metrics.
|
couldn't find a way to expose the STT connection time from the stt_node |
| handoff_start = time.monotonic() | ||
| drain_duration = 0.0 | ||
| new_activity_duration = 0.0 | ||
| stt_reused = False | ||
| rt_session_reused = False |
There was a problem hiding this comment.
π‘ HandoffMetrics duration includes lock acquisition wait time, contradicting its docstring
handoff_start is captured at agent_session.py:1281, before async with self._activity_lock: at line 1287, but total_duration is computed at agent_session.py:1372 inside the lock. If _activity_lock is contended (e.g., an AgentTask handoff overlaps with an update_agent call), duration silently includes the time spent waiting on the lock. This contradicts the HandoffMetrics.duration docstring (livekit-agents/livekit/agents/metrics/base.py:189) which states it measures "drain + close/pause + start/resume" β none of those phases include lock contention. The sub-fields drain_duration and new_activity_duration are measured correctly inside the lock, so duration will always exceed their sum by an unaccounted gap.
| handoff_start = time.monotonic() | |
| drain_duration = 0.0 | |
| new_activity_duration = 0.0 | |
| stt_reused = False | |
| rt_session_reused = False | |
| stt_reused = False | |
| rt_session_reused = False | |
Was this helpful? React with π or π to provide feedback.
Summary
HandoffMetricsto measure agent handoff cost, emitted viametrics_collectedand attached toAgentHandoff.metricsdrain_duration,new_activity_duration,on_enter_duration, and totaldurationstt_reused,realtime_session_reused)Limitation
HandoffMetricsdoes not include the WebSocket connection time for new realtime sessions or STT streams. When resources are not reused, the connection is established asynchronously in a background task. This connection time is already reported separately viaRealtimeModelMetrics.acquire_timeandSTTMetrics.acquire_timethrough the existingmetrics_collectedpipeline.