-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🐛 Bug
Runtime error when using DDP with apex.amp.
To Reproduce
Replace with
trainer = pl.Trainer(
max_epochs=1,
gpus=1,
num_nodes=1,
accelerator='ddp',
amp_backend='apex',
amp_level='O3',
precision=16,
progress_bar_refresh_rate=20
)And export Boring Model as .py to your machine with apex installed.
The initialization of apex.amp lies at https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/plugins/precision/apex_amp.py#L41, called at https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/trainer/trainer.py#L440 . However, the initialization of apex.amp requires the model to be at GPU side, where the transition https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/plugins/training_type/ddp.py#L272
called at https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/trainer/trainer.py#L481 happens after init. Thus, the if-statement https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/plugins/precision/apex_amp.py#L50 ignores to performing apex configuration.