-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🐛 Bug
I'm using a ReduceLROnPlateau scheduler and having it monitor my validation F1. When telling pytorch lightning to only execute validation every 5 steps (so >1), it doesn't log anything in my validation_step method and thus the val.f1 metric is never set. At the end of a training epoch, it tries to check my val.f1 metric for the scheduler and throws me an error that it can't find it (and only sees all of my train metrics). To get around this problem, I tried logging a value of 0 for val.f1 in on_train_start which according to the documentation should be called only once right before the epochs loop (see https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#hooks). During training though this method is called after every epoch. This is not the expected behaviour.
Example of multiple logging of val.f1
To Reproduce
Include the on_train_start code in your lightning model file, use a ReduceLROnPlateau scheduler that monitors a validation metric, and set check_val_every_n_epoch > 1.
def on_train_start(self): self.log("val.f1", 0)
Expected behavior
on_train_start is only called once.
(A LR Scheduler that cant find its metric doesnt throw an error but only a warning....)
Packages:
- numpy: 1.21.5
- pyTorch_debug: False
- pyTorch_version: 1.11.0
- pytorch-lightning: 1.6.0
- tqdm: 4.63.0
cc @carmocca @edward-io @ananthsub @rohitgr7 @kamil-kaczmarek @Raalsky @Blaizzy