Skip to content

feat(dataviewer): support multiple camera angles in episode viewer #436

@akzaidi

Description

@akzaidi

Component

data-management/viewer/ (frontend + backend)

Problem Statement

The current dataviewer episode viewer renders a single camera stream per episode. Bimanual VLA policies (TwinVLA, π₀, RDT-1B) require up to 3 camera views during training and inference:

  • Front/ego-centric camera — workspace overview
  • Left wrist camera — left arm end-effector view
  • Right wrist camera — right arm end-effector view

RoboTwin 2.0 datasets include front_image, wrist_image_left, and wrist_image_right per episode step. LeRobot datasets may include additional camera keys. Without multi-camera visualization, operators cannot:

  1. Verify camera coverage and alignment across views
  2. Inspect occlusion or lighting issues in individual camera streams
  3. Correlate spatial relationships between arm-mounted and workspace cameras
  4. Quality-check the exact visual inputs the VLA model receives during training

Proposed Solution

Add multi-camera display support to the episode viewer:

  1. Auto-detect camera keys from LeRobot dataset metadata (scan observation.images.* keys)
  2. Grid layout — display all camera views simultaneously in a responsive grid (1×1 for single camera, 1×3 for three cameras, 2×2 for four, etc.)
  3. Camera selector — allow toggling individual cameras on/off via a camera panel
  4. Synchronized scrubbing — all camera views stay frame-synchronized when scrubbing the timeline
  5. Per-camera zoom — click a camera view to expand it to full width while keeping others visible as thumbnails
  6. Camera labels — display the LeRobot image key name as an overlay on each view

Technical Notes

  • LeRobot v3.0 stores images as MP4 videos under videos/{camera_key}_episode_{idx}.mp4
  • Camera keys are defined in dataset info.json under features.observation.images
  • The backend already streams video frames; the change is primarily frontend layout and state management
  • Consider using CSS grid with auto-fit for responsive layout

Acceptance Criteria

  • Episode viewer detects and displays all available camera streams from the dataset
  • Camera views are frame-synchronized with the timeline scrubber
  • Individual cameras can be toggled on/off
  • Clicking a camera view expands it; clicking again returns to grid
  • Camera key names displayed as overlay labels
  • Works with single-camera datasets (no regression)
  • Works with 3-camera bimanual datasets (RoboTwin 2.0, TwinVLA Tabletop-Sim)

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/srcSource code in src directoryenhancementNew feature or improvement request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions