Skip to content

Lightning + DDP + apex.amp fails to initialize #7271

@caic99

Description

@caic99

🐛 Bug

Runtime error when using DDP with apex.amp.

To Reproduce

Replace with

    trainer = pl.Trainer(
        max_epochs=1,
        gpus=1,
        num_nodes=1,
        accelerator='ddp',
        amp_backend='apex',
        amp_level='O3',
        precision=16,
        progress_bar_refresh_rate=20
    )

And export Boring Model as .py to your machine with apex installed.

The initialization of apex.amp lies at https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/plugins/precision/apex_amp.py#L41, called at https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/trainer/trainer.py#L440 . However, the initialization of apex.amp requires the model to be at GPU side, where the transition https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/plugins/training_type/ddp.py#L272
called at https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/trainer/trainer.py#L481 happens after init. Thus, the if-statement https://github.com/PyTorchLightning/pytorch-lightning/blob/e272bea4dcf679d15ab836310c82193641e79778/pytorch_lightning/plugins/precision/apex_amp.py#L50 ignores to performing apex configuration.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedOpen to be worked onpriority: 0High priority task

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions