-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🐛 Bug
Using the LRMonitor callback breaks wandb logging by causing the step count to become incorrect. The image below shows varying epoch/step count while overfitting batches with No LR monitor, LearningRateMonitor(logging_interval="epoch") and LearningRateMonitor(logging_interval=None)

Neat-bee-446 does not use the LRMonitor callback and the ratio of step#:epoch# is 1:1
Devout-forest-447 adds as a callback LearningRateMonitor(logging_interval="epoch"), and the ratio of step#:epoch# becomes 2:1
Woven-dew-448 uses the callback LearningRateMonitor() and the ratio of step#:epoch# becomes 3:1
When not overfitting a batch, LearningRateMonitor() has the correct number of steps, but LearningRateMonitor(logging_interval="epoch") and LearningRateMonitor(logging_interval="step") still have double what they should
Also, this doesn't occur with tensorboard, only wandb.
Expected behavior
The logged step count should be correct and not adversely impacted by adding the LRMonitor callback.
Environment
- PyTorch Lightning Version: 1.6.2
- WandB Version: 0.12.5
- PyTorch Version (e.g., 1.11:
- Python version (e.g., 3.8.10):
- OS (e.g., Linux): Linux
- How you installed PyTorch: Pip
Additional context
cc @awaelchli @morganmcg1 @AyushExel @borisdayma @scottire @manangoel99 @rohitgr7