Skip to content

logger called during on_train_start called multiple times during training #13554

@paulhager

Description

@paulhager

🐛 Bug

I'm using a ReduceLROnPlateau scheduler and having it monitor my validation F1. When telling pytorch lightning to only execute validation every 5 steps (so >1), it doesn't log anything in my validation_step method and thus the val.f1 metric is never set. At the end of a training epoch, it tries to check my val.f1 metric for the scheduler and throws me an error that it can't find it (and only sees all of my train metrics). To get around this problem, I tried logging a value of 0 for val.f1 in on_train_start which according to the documentation should be called only once right before the epochs loop (see https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#hooks). During training though this method is called after every epoch. This is not the expected behaviour.

Example of multiple logging of val.f1

To Reproduce

Include the on_train_start code in your lightning model file, use a ReduceLROnPlateau scheduler that monitors a validation metric, and set check_val_every_n_epoch > 1.

def on_train_start(self): self.log("val.f1", 0)

Expected behavior

on_train_start is only called once.

(A LR Scheduler that cant find its metric doesnt throw an error but only a warning....)

Packages:
- numpy: 1.21.5
- pyTorch_debug: False
- pyTorch_version: 1.11.0
- pytorch-lightning: 1.6.0
- tqdm: 4.63.0

cc @carmocca @edward-io @ananthsub @rohitgr7 @kamil-kaczmarek @Raalsky @Blaizzy

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingloggingRelated to the `LoggerConnector` and `log()`

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions