Skip to content

nn.Module subclasses uncollected by get_all_subclasses raise RecursionError after materialization #13042

@akihironitta

Description

@akihironitta

🐛 Bug

Materialization of a nn.Module subclass that is not collected by get_all_subclasses sets its materialized module to its child and thus leads to RecursionError.

To Reproduce

Here’s the minimal code for repro:

from pytorch_lightning.demos.boring_classes import BoringModel
from pytorch_lightning.utilities.meta import materialize_module, init_meta_context

with init_meta_context():
    model = BoringModel()  # BoringModel is somehow not in `pl.utilities.meta.get_all_subclasses(torch.nn.Module)`

materialize_module(model)

model.layer
model.layer.layer  # should raise AttributeError, but it doesn't atm
model.layer.layer.layer.layer.layer.layer  # should raise AttributeError, but it doesn't atm
print(model)
Traceback (most recent call last):
  File “/Users/nitta/work/github.com/PyTorchLightning/pytorch-lightning/bug_meta.py”, line 12, in <module>
    print(model)
  File “/Users/nitta/.miniconda3/envs/dev39/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1828, in __repr__
    mod_str = repr(module)
  File “/Users/nitta/.miniconda3/envs/dev39/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1828, in __repr__
    mod_str = repr(module)
  File “/Users/nitta/.miniconda3/envs/dev39/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1828, in __repr__
    mod_str = repr(module)
  [Previous line repeated 328 more times]
  File “/Users/nitta/.miniconda3/envs/dev39/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1822, in __repr__
    extra_repr = self.extra_repr()
  File “/Users/nitta/.miniconda3/envs/dev39/lib/python3.9/site-packages/torch/nn/modules/linear.py”, line 106, in extra_repr
    return ‘in_features={}, out_features={}, bias={}’.format(
RecursionError: maximum recursion depth exceeded while getting the str of an object

Expected behavior

No error.

Environment

Same env as the CI.

  • PyTorch Lightning Version (e.g., 1.5.0): master
  • PyTorch Version (e.g., 1.10): 1.11.0
  • Python version (e.g., 3.9): 3.9
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

In #12984, we’ve found these two of the standalone test cases fail due to RecusionError:

FAILED tests/strategies/test_deepspeed_strategy.py::test_deepspeed_with_meta_device
FAILED tests/strategies/test_deepspeed_strategy.py::test_deepspeed_with_meta_device

on which the minimal code above is based.

cc @otaj

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions