Skip to content

🐛 [Bug] Issue in conversion when parameters/buffers are moved during compilation #2658

@gs-olive

Description

@gs-olive

Bug Description

Bug 1

  File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch_tensorrt/dynamo/conversion/converter_utils.py", line 491, in to_numpy
    output = value.cpu().detach().contiguous().numpy()
RuntimeError: .numpy() is not supported for tensor subclasses.

Suggested Fix 1

Need a custom version of the following function which registers a parameter, not a buffer

replace_node_with_constant(gm, node, constant)

Bug 2

File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/_ops.py", line 571, in __call__
    return self_._op(*args, **kwargs)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and meta! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

Suggested Fix 2

Need to cast constant Tensors to nn.Parameter on CUDA at constant-folding time

replace_node_with_constant(gm, node, constant)

Bug 3

File "<eval_with_key>.67 from /root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/fx/experimental/proxy_tensor.py:569 in wrapped", line 11, in forward
File "/root/.pyenv/versions/3.10.13/lib/python3.10/site-packages/torch/_ops.py", line 571, in __call__
  return self_._op(*args, **kwargs)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and meta! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

Suggested Fix 3

This line needs to be removed, as it has unintended behavior when casting constant params

module.to(to_torch_device(settings.device))

Expected behavior

Model should compile

Environment

  • Torch and Torch-TensorRT Version: 2.3.0.dev2024222+cu121

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions