Skip to content

Keep segmentation compatible with typed mujoco-warp#911

Open
tkelestemur wants to merge 4 commits intomujocolab:mainfrom
tkelestemur:tarik/mjwarp-segmentation-compat
Open

Keep segmentation compatible with typed mujoco-warp#911
tkelestemur wants to merge 4 commits intomujocolab:mainfrom
tkelestemur:tarik/mjwarp-segmentation-compat

Conversation

@tkelestemur
Copy link
Copy Markdown

This updates mjlab for the segmentation API change in google-deepmind/mujoco_warp#1283 while keeping mjlab's public camera API backward-compatible. mujoco_warp now exposes segmentation as (object_id, object_type) pairs (once the upstream PR is merged(, but mjlab still expects per-pixel geom IDs with -1 for background and -2 for flex hits. To preserve that contract, this adds a small Warp normalization step in the render/sense pipeline that converts the typed segmentation buffer back into mjlab's existing (B, H, W, 1) int32 output before camera data is exposed to sensors, viewers, or manipulation observations. This means downstream code such as camera_segmentation() and camera_target_cube_mask() continues to work unchanged.

This PR also bumps the pinned mujoco-warp revision to the current head commit for that upstream change and adds regression coverage for both the normalization logic and the downstream observation helpers that consume segmentation.

I validated the change with make check, make test, and make docs locally, and on. The main non-obvious tradeoff is that, because the upstream PR is not merged yet, this temporarily pins mjlab to the PR head commit rather than a merged upstream release.

Copy link
Copy Markdown
Collaborator

@kevinzakka kevinzakka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on @tkelestemur!

Reading the review thread on google-deepmind/mujoco_warp#1283, it sounds like the maintainers are okay with the breaking change. Given how new camera segmentation is in mjlab, could we just propagate the break instead of shimming the typed buffer back to the old (-1, -2, geom_id) layout?

WDYT?

@tkelestemur
Copy link
Copy Markdown
Author

Thanks @kevinzakka

I agree. I updated the PR to propagate the upstream typed segmentation break instead of shimming it back to the legacy (-1, -2, geom_id) layout. CameraSensorData.segmentation now exposes (object_id, object_type) pairs with shape [B, H, W, 2], and I updated the downstream consumers that depended on the old format, including the target-cube mask logic and the Viser segmentation view.

I also bumped the mujoco_warp pin to the latest head of google-deepmind/mujoco_warp#1283.

Let me know what you think!

@StafaH
Copy link
Copy Markdown

StafaH commented Apr 16, 2026

@kevinzakka I think it makes more sense for the output to be 1 channel rather than 2 for training. I think the original conversion should just be updated so that the flex objects get a unique integer id after the regular geoms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants