Skip to content

enable_pl_optimizer causes optimizers to not be restored properly #5224

@PiotrDabkowski

Description

@PiotrDabkowski

🐛 Bug

enable_pl_optimizer (default!) causes optimizers to not be restored properly from the checkpoint specified by resume_from_checkpoint.

BoringModel Colab Reproduction

The model is trained for 3 epochs and saved in a checkpoint. The checkpoint is then restored and further trained for 1 epoch (with different values of enable_pl_optimizer), the training loss is printed at each step.
The setup where enable_pl_optimizer=True shows a huge loss spike after the first optimizer step, suggesting that the optimizer is not restored properly.

https://colab.research.google.com/drive/1lHYXm4MpnmXwPZTcPem4D4wwwU5vJhHc?usp=sharing

Expected behavior

PL Optimizers are restored such that there is no huge loss spike after restore, just like when enable_pl_optimizer=False.

Environment

See Colab.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedOpen to be worked onpriority: 1Medium priority task

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions