Skip to content

[BUG] auto_move_data does not work with DataParallel #6563

@YannDubs

Description

@YannDubs

🐛 Bug

In case your forward function is wrapped with auto_move_data it will not work with DataParallel because it will try to send the data to self.device which in dataparallel is always the main device.

i.e. the following won't work with accelerator="dp" (and probably also with "ddp"):

class Module(pl.LightningModule):
     ...
     
     @auto_move_data
     def forward(x):
         ...

     def training_step(self, batch, batch_idx):
         x = self.forward(batch[0])
         ...

The error comes from this line: https://github.com/PyTorchLightning/pytorch-lightning/blob/b190403e282cbcb71147c7b618654476b08578a5/pytorch_lightning/core/hooks.py#L646

self.device should probably be replaced by torch.distributed.get_rank() when torch.distributed.is_available() and torch.distributed.is_initialized()

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions