Skip to content

feat(rollout): decouple policy deployment from data recording with new lerobot-rollout CLI#3413

Open
imstevenpmwork wants to merge 6 commits intomainfrom
feat/lerobot-rollout
Open

feat(rollout): decouple policy deployment from data recording with new lerobot-rollout CLI#3413
imstevenpmwork wants to merge 6 commits intomainfrom
feat/lerobot-rollout

Conversation

@imstevenpmwork
Copy link
Copy Markdown
Collaborator

@imstevenpmwork imstevenpmwork commented Apr 19, 2026

Summary / Motivation

  • Introduces a clean architectural split between data collection (lerobot-record) and policy deployment (lerobot-rollout). Previously, lerobot-record handled teleoperation, policy inference, RTC async inference, and DAgger-style human-in-the-loop corrections in a single script with growing conditional complexity. This PR extracts all policy-related execution into a new lerobot.rollout module with a dedicated CLI, leaving lerobot-record as a focused teleoperation-only tool.
  • The new module follows a Strategy + Factory + Dependency Injection architecture: strategies are pluggable (base, sentry, highlight, dagger), inference backends are swappable (sync, rtc), and all shared state is assembled once into a RolloutContext bundle. This makes each deployment mode independently testable and extensible without touching the others.

Related issues

  • Related: N/A

What changed

  • New src/lerobot/rollout/ module — Complete policy deployment engine with:
    • 4 strategies: BaseStrategy (autonomous, no recording), SentryStrategy (continuous recording + auto-upload), HighlightStrategy (ring buffer + keystroke save), DAggerStrategy (human-in-the-loop RaC with keyboard/pedal input)
    • 2 inference engines: SyncInferenceEngine (inline) and RTCInferenceEngine (async background thread with action queue)
    • RolloutContext dependency injection bundle with sub-contexts (runtime, hardware, policy, processor, dataset)
    • ThreadSafeRobot wrapper for lock-protected concurrent hardware access
    • RolloutRingBuffer memory-bounded circular buffer for highlight capture
    • Polymorphic CLI dispatch via draccus ChoiceRegistry (--strategy.type=..., --inference.type=...)
  • New lerobot-rollout CLI (src/lerobot/scripts/lerobot_rollout.py) registered in pyproject.toml
  • Simplified lerobot-record — Stripped all policy/inference code; teleop is now required.
  • Extracted shared utilities:
    • DatasetRecordConfigsrc/lerobot/configs/dataset.py (shared between record and rollout)
    • ActionInterpolatorsrc/lerobot/utils/action_interpolator.py (decoupled from RTC policy module)
    • start_pedal_listenersrc/lerobot/utils/pedal.py (evdev foot pedal support with graceful degradation)
  • Thread safety fix in policies/rtc/action_queue.py — added missing lock guards on qsize(), empty(), get_action_index()
  • Removed examples/hil/hil_data_collection.py (1184 lines), examples/hil/hil_utils.py, examples/rtc/eval_with_real_robot.py (673 lines) — replaced by lerobot-rollout
  • New examplesrollout.py and updated evaluate.py for lekiwi, phone-to-so100, and so100-EE setups
  • New docsdocs/source/inference.mdx covering all strategies and examples; updated hil_data_collection.mdx, il_robots.mdx, rtc.mdx

How was this tested (or how to run locally)

  • Full Tests CI run: https://github.com/huggingface/lerobot/actions/runs/24641227376
  • Minimal Tests added: tests/test_rollout.py — covers config validation, ring buffer, thread-safe robot, strategy/inference factory dispatch, DAgger state machine transitions, and context dataclass fields. Run with pytest -q tests/test_rollout.py -svv
  • Basic Manual HW checks, 2 cameras & OMX robot (mostly for testing the rollout strategies):
    • lerobot-rollout --strategy.type=base
    • lerobot-rollout --strategy.type=sentry
    • lerobot-rollout --strategy.type=dagger with keyboard
    • lerobot-rollout --strategy.type=highlight with keyboard
    • lerobot-record pure teleoperation — no regressions
  • Deep E2E tests (same setup):

Checklist (required before merge)

  • Linting/formatting run (pre-commit run -a)
  • All tests pass locally (pytest)
  • Documentation updated
  • CI is green
  • Community Review: I have reviewed another contributor's open PR and linked it here: #

Reviewer notes

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@imstevenpmwork imstevenpmwork self-assigned this Apr 19, 2026
@imstevenpmwork imstevenpmwork marked this pull request as ready for review April 19, 2026 23:01
@github-actions github-actions Bot added documentation Improvements or fixes to the project’s docs policies Items related to robot policies dataset Issues regarding data inputs, processing, or datasets tests Problems with test coverage, failures, or improvements to testing configuration Problems with configuration files or settings examples Issues related to the examples labels Apr 19, 2026
@huggingface huggingface deleted a comment from claude Bot Apr 19, 2026
@imstevenpmwork
Copy link
Copy Markdown
Collaborator Author

@claude Review this PR

@claude
Copy link
Copy Markdown

claude Bot commented Apr 20, 2026

Claude finished @imstevenpmwork's task in 4m 1s —— View job


PR Review: lerobot-rollout CLI

I read all 54 changed files in detail. This is a well-structured PR that cleanly separates policy deployment from data recording. The Strategy + Factory + DI architecture is solid and the code is well-documented. Below is my full technical review.


Overall Assessment

Architecture: The decomposition into strategies (base/sentry/highlight/dagger), inference backends (sync/rtc), and context injection is clean and extensible. Each strategy is self-contained and the InferenceEngine ABC allows backend-agnostic action dispatch via send_next_action. Strong separation of concerns.

Code quality: Consistently documented, good logging, clean module boundaries. The codebase follows existing LeRobot patterns (draccus ChoiceRegistry, lazy imports, __all__ exports).


Bugs & Correctness Issues

1. DatasetRecordConfig.__post_init__ always appends timestamp — breaks --resume
src/lerobot/configs/dataset.py:74-77

def __post_init__(self) -> None:
    if self.repo_id:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        self.repo_id = f"{self.repo_id}_{timestamp}"

When --resume=true, build_rollout_context calls LeRobotDataset.resume(cfg.dataset.repo_id) — but repo_id already has a fresh timestamp appended, so it will attempt to resume a dataset that doesn't exist. The timestamp should be conditional on not resuming, or this logic should be handled at the call site rather than in __post_init__.

Fix this →

2. DAgger strategy missing dataset validation in RolloutConfig.__post_init__
src/lerobot/rollout/configs.py:191-193

needs_dataset = isinstance(self.strategy, (SentryStrategyConfig, HighlightStrategyConfig))

DAggerStrategyConfig is not included in needs_dataset, but both _run_continuous and _run_corrections_only call dataset.add_frame() and dataset.save_episode() unconditionally. A missing --dataset.repo_id with DAgger will crash at runtime instead of failing fast at config validation.

Fix this →

3. DAgger correction phase incorrectly applies record_stride to human frames
src/lerobot/rollout/strategies/dagger.py:449 and 615

record_stride = max(1, cfg.interpolation_multiplier)
# ...
if record_tick % record_stride == 0:  # in CORRECTING phase
    dataset.add_frame(frame)
record_tick += 1

When interpolation_multiplier > 1, this skips every Nth human correction frame. But during corrections, the teleoperator provides frames at base FPS — there's no interpolation. This would discard human demonstration data. record_stride should only apply during the autonomous phase.

Fix this →

4. send_next_action silently produces partial action dicts
src/lerobot/rollout/strategies/core.py:269

action_dict = {k: interp[i].item() for i, k in enumerate(ordered_keys) if i < len(interp)}

If the interpolated tensor has fewer elements than ordered_action_keys, some keys are silently dropped. Sending a partial action dict to the robot could cause unpredictable hardware behavior. This should be a hard error, or at minimum a warning.

5. Highlight strategy has no episode lock on background push
src/lerobot/rollout/strategies/highlight.py:257-277_background_push calls safe_push_to_hub without _episode_lock, unlike Sentry (sentry.py:211-222) and DAgger (dagger.py:719-730) which both hold the lock. If the main loop calls dataset.save_episode() concurrently with a push, there could be a data race on the dataset's on-disk state.


Design Concerns

6. Global default constants changed — affects all users
src/lerobot/datasets/utils.py:74-75

DEFAULT_DATA_FILE_SIZE_IN_MB = 50   # was 100
DEFAULT_VIDEO_FILE_SIZE_IN_MB = 100  # was 200

These constants are used throughout the dataset system, not just by rollout. Halving the chunk sizes changes episode rotation behavior for existing pipelines, training scripts, and any code that depends on these defaults. This should be called out in the PR description and potentially discussed separately.

7. PEFT-via-record test removed with no rollout equivalent
tests/test_cli_peft.py — 82 lines deleted (test_peft_record_loads_policy). This verified that PEFT adapters could be loaded through the record CLI with policy inference. Since lerobot-record no longer loads policies, the test was removed — but there's no equivalent test for PEFT loading through lerobot-rollout. This is a test coverage gap for a supported workflow.

8. No integration tests for strategy control loops
The tests in test_rollout.py cover configs, ring buffer, state machine, and factories — but none exercise a strategy's run() method, even with mocked hardware. The complex control loops in sentry, highlight, and dagger (episode rotation, push timing, state transitions during recording) are only tested manually per the PR description.


Minor Issues

9. Dead code: _teleop_smooth_move_to
src/lerobot/rollout/strategies/dagger.py:173-194 — This helper is defined but never called. All 5 call sites are commented out with TODO notes. Should either be removed or the TODO resolved.

10. Duplicate frame in Highlight ring buffer after episode save
src/lerobot/rollout/strategies/highlight.py:170-187 — When the user presses save during live recording, the frame at line 170 is added to the dataset and the episode is saved. Then at line 184, _recording_live is now False so the same frame is appended to the ring buffer at line 187. This means the first frame of the next highlight will duplicate the last frame of the previous episode.

11. RolloutRingBuffer single-threaded assumption has no runtime guard
src/lerobot/rollout/ring_buffer.py:41-44 — The docstring notes the buffer is single-threaded, but there's no assertion or lock to catch accidental misuse. Since _current_bytes accounting would break under concurrent access, a lightweight debug-mode check could prevent subtle bugs.

12. context.py:build_rollout_context is ~200 lines
The function performs 8 sequential phases (policy loading, processors, hardware, features, dataset, policy processors, inference, assembly). While the ordering matters for fail-fast behavior, extracting some phases into named helpers would improve readability.


What Looks Good

  • Thread-safe robot wrapper (robot_wrapper.py) — Clean lock-protected I/O with read-only property proxying. Correct design for RTC concurrent access.
  • DAgger state machine (dagger.py:88-164) — Transition validation table with thread-safe consume/request pattern is robust and well-tested.
  • Ring buffer memory accounting (ring_buffer.py) — Proper byte estimation for torch tensors, numpy arrays, and primitives. Eviction loop is correct.
  • RTC error handling (rtc.py) — Consecutive error counting with threshold, graceful global shutdown propagation, and compile warmup management are production-quality.
  • Action interpolator extraction — Clean decoupling from RTC module to utils/ with backward-compat re-export.
  • Signal handler consolidationProcessSignalHandler moved from rl/process.py to utils/process.py for shared use across record and rollout.

Summary

This is a significant, well-designed architectural improvement. The separation of lerobot-record (teleop-only) from lerobot-rollout (policy deployment) is a clean split that will reduce maintenance burden and enable independent strategy evolution.

Must-fix before merge: Issues 1-3 (resume broken by timestamp, DAgger dataset validation, correction frame loss).
Should-fix: Issues 4-5 (partial action dict, highlight push lock).
Worth discussing: Issue 6 (global default changes) and test coverage (7-8).

whats2000

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

configuration Problems with configuration files or settings dataset Issues regarding data inputs, processing, or datasets documentation Improvements or fixes to the project’s docs examples Issues related to the examples policies Items related to robot policies tests Problems with test coverage, failures, or improvements to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants