Skip to content

Optimizers are broken with auto_lr_find=True since 1.1.4 #6285

@indigoviolet

Description

@indigoviolet

🐛 Bug

It seems like #5244 (which went out with 1.1.4) caused some bad interaction with auto_lr_find=True.

Specifically, lightning_optimizers are now cached on the Trainer. However, if we update the lr with auto_lr_find=True, we would expect the optimizers returned from configure_optimizers to change -- so that the lightning_optimizers need to be updated -- but this is no longer handled because we no longer re-wrap the optimizers in the general case.

The outcome for me is that training just doesnt converge because we're updating the wrong optimizer.

Please reproduce using the BoringModel

https://colab.research.google.com/drive/1PJGOBSUdl5_-U9O-fvo83V1On6_siwAC?usp=sharing

To Reproduce

See the colab^

Expected behavior

Training should work!

Environment

  • CUDA:
    • GPU:
      • Tesla T4
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.19.5
    • pyTorch_debug: False
    • pyTorch_version: 1.7.1+cu101
    • pytorch-lightning: 1.2.1
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.7.10
    • version: Proposal for help #1 SMP Thu Jul 23 08:00:38 PDT 2020

Additional context

  1. This was a pretty frustrating bug to track down, it broke training on my model in a super unconnected way and I had to literally git bisect both my repo and pytorch-lightning's repo to find it.

  2. It's scary to me that the bug seems to have gone unnoticed for so many versions -- does no one use auto_lr_find=True? Are there no test cases checking this combination?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinghelp wantedOpen to be worked onpriority: 0High priority tasktuner

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions