[Bug] vLLM codegen breaks on non-qualified reprs

### Bug description

### Contex

RL code on 04/23 started failing with errors like:

```
    File "/home/lucaskabela/torchtitan/torchtitan/experiments/rl/models/vllm_wrapper.py", line 270, in forward
      def forward(
    File "/home/lucaskabela/pytorch/torch/_dynamo/eval_frame.py", line 1297, in _fn
      return fn(*args, **kwargs)
    File "/home/lucaskabela/vllm/vllm/compilation/caching.py", line 215, in __call__
      return self.optimized_call(*args, **kwargs)
    File "<string>", line 316, in execution_fn
  NameError: name 'Shard' is not defined
```

This is because vllm/compilation/codegen.py::_node_ref` falls back to `repr(arg)` when stringifying
  non-primitive graph arguments (line 155). The generated source is `exec`'d in a                                                                                                                                                                     
  namespace that only has `import torch` (line 101, line 130). For any arg whose                                                                                                                                                                      
  `repr()` is not eval-able in that torch-only namespace, the generated                                                                                                                                                                               
  `execution_fn` raises `NameError` on first call.                                                                                                                                                                                                    
                                                                                                                                                                                                                                                      
  `torch.device` is the most common example: `repr(torch.device('cuda:0'))` returns                                                                                                                                                                   
  the bare string `"device(type='cuda', index=0)"`. Any model whose forward lifts a                                                                                                                                                                   
  device into a graph arg (e.g. via `.to(device)`, `torch.zeros(..., device=d)`,                                                                                                                                                                      
  tensor factories inside a traced op) will hit this. DTensor placement types
  (`Shard`, `Replicate`, `Partial`) have the same shape — `repr(Shard(dim=2))` is                                                                                                                                                                     
  `'Shard(dim=2)'`

We have worked around this with monkeypatching, but need a proper upstream fix, and to resolve this monkeypatch once landed 

### Versions

  `vllm` at commit `595562651` (main).                                                                                                                                                                                                                

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] vLLM codegen breaks on non-qualified reprs #3067

Bug description

Contex

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] vLLM codegen breaks on non-qualified reprs #3067

Description

Bug description

Contex

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions