-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onpriority: 1Medium priority taskMedium priority taskstrategy: dp (removed in pl)DataParallelDataParallel
Description
🐛 Bug
In case your forward function is wrapped with auto_move_data it will not work with DataParallel because it will try to send the data to self.device which in dataparallel is always the main device.
i.e. the following won't work with accelerator="dp" (and probably also with "ddp"):
class Module(pl.LightningModule):
...
@auto_move_data
def forward(x):
...
def training_step(self, batch, batch_idx):
x = self.forward(batch[0])
...The error comes from this line: https://github.com/PyTorchLightning/pytorch-lightning/blob/b190403e282cbcb71147c7b618654476b08578a5/pytorch_lightning/core/hooks.py#L646
self.device should probably be replaced by torch.distributed.get_rank() when torch.distributed.is_available() and torch.distributed.is_initialized()
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onpriority: 1Medium priority taskMedium priority taskstrategy: dp (removed in pl)DataParallelDataParallel