fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in… by masato-ka · Pull Request #3419 · huggingface/lerobot

masato-ka · 2026-04-20T15:06:17Z

Summary / Motivation

In transformers 5.x, CLIPModel.get_image_features() and get_text_features()
return a BaseModelOutputWithPooling object instead of a plain torch.FloatTensor.
This caused an AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach' when running SARM policies with transformers 5.x.

This PR adds an isinstance check in SARMEncodingProcessorStep to extract
pooler_output when the return value is not a plain tensor, maintaining full
backward compatibility with transformers 4.x.

Related issues

Fixes: [Bug] SARM training fails with AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach' on transformers 5.x #3418

What changed

src/lerobot/policies/sarm/processor_sarm.py: In both _encode_images() and
_encode_text(), added a guard to unwrap BaseModelOutputWithPooling.pooler_output
when get_image_features() / get_text_features() does not return a plain
torch.Tensor. No behavioral change under transformers 4.x.

How was this tested (or how to run locally)

Manual verification: reproduced the AttributeError with transformers 5.x,
confirmed it is resolved after this fix.
No dedicated unit test added (SARM tests require hardware/large model downloads);
the fix is a two-line guard and straightforward to inspect.

To reproduce the original error:

pip install "transformers>=5.0"
# Run any SARM encode step that calls get_image_features / get_text_features

Checklist (required before merge)

Linting/formatting run (pre-commit run -a)
All tests pass locally (pytest)
[] Documentation updated(Don't need)
[] CI is green
[] Community Review: I have reviewed another contributor's open PR and linked it
here: # (insert PR number/link)

Reviewer notes

The only changed file is src/lerobot/policies/sarm/processor_sarm.py, two
symmetric hunks (one for image, one for text encoding).
The fix relies on pooler_output, which is the standard attribute for pooled
features in all HuggingFace model outputs — equivalent to what the old plain-tensor
return contained.
Anyone in the community is free to review the PR.

… CLIP encoding In transformers 5.x, CLIPModel.get_image_features() and get_text_features() return BaseModelOutputWithPooling instead of a plain torch.FloatTensor. Added isinstance check to extract pooler_output when the return value is not a tensor, maintaining backward compatibility with transformers 4.x. Fixes AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'detach' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ChuyaoShen · 2026-04-20T17:45:59Z

+            # transformers 5.x returns BaseModelOutputWithPooling instead of a plain tensor
+            output = self.clip_model.get_image_features(**inputs)
+            if not isinstance(output, torch.Tensor):
+                output = output.pooler_output


nit: maybe we should assert output is not None to please mypy.

github-actions Bot added the policies Items related to robot policies label Apr 20, 2026

pkooij self-requested a review April 20, 2026 17:24

Merge branch 'main' into fix/sarm-adapt-to-transformer5.x

fb0efe7

ChuyaoShen reviewed Apr 20, 2026

View reviewed changes

ChuyaoShen mentioned this pull request Apr 20, 2026

fix(policies): remove @dataclass from GR00TN15Config to fix import crash on transformers ≥ 5.5 #3414

Open

5 tasks

Merge branch 'main' into fix/sarm-adapt-to-transformer5.x

2ce36ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in…#3419

fix(sarm): handle BaseModelOutputWithPooling from transformers 5.x in…#3419
masato-ka wants to merge 3 commits intohuggingface:mainfrom
masato-ka:fix/sarm-adapt-to-transformer5.x

masato-ka commented Apr 20, 2026

Uh oh!

ChuyaoShen Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

masato-ka commented Apr 20, 2026

Summary / Motivation

Related issues

What changed

How was this tested (or how to run locally)

Checklist (required before merge)

Uh oh!

ChuyaoShen Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants