Skip to content

LearningRateMonitor callback causes unexpected changes in step/epoch count with WandBLogger #13016

@rbracco

Description

@rbracco

🐛 Bug

Using the LRMonitor callback breaks wandb logging by causing the step count to become incorrect. The image below shows varying epoch/step count while overfitting batches with No LR monitor, LearningRateMonitor(logging_interval="epoch") and LearningRateMonitor(logging_interval=None)
image

Neat-bee-446 does not use the LRMonitor callback and the ratio of step#:epoch# is 1:1
Devout-forest-447 adds as a callback LearningRateMonitor(logging_interval="epoch"), and the ratio of step#:epoch# becomes 2:1
Woven-dew-448 uses the callback LearningRateMonitor() and the ratio of step#:epoch# becomes 3:1

When not overfitting a batch, LearningRateMonitor() has the correct number of steps, but LearningRateMonitor(logging_interval="epoch") and LearningRateMonitor(logging_interval="step") still have double what they should

Also, this doesn't occur with tensorboard, only wandb.

Expected behavior

The logged step count should be correct and not adversely impacted by adding the LRMonitor callback.

Environment

  • PyTorch Lightning Version: 1.6.2
  • WandB Version: 0.12.5
  • PyTorch Version (e.g., 1.11:
  • Python version (e.g., 3.8.10):
  • OS (e.g., Linux): Linux
  • How you installed PyTorch: Pip

Additional context

cc @awaelchli @morganmcg1 @AyushExel @borisdayma @scottire @manangoel99 @rohitgr7

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions