-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
Description
Description & Motivation
Hi,
I've been trying to do full validation on step 0 but everything I've tried has failed in some way. I am aware of this stale issue but I could not re-open it so I created this one. Running full validation on step 0 is very useful for the cases where we want to finetune an already well-perfoming model.
These are the things I've tried:
- Using
trainer.validate()Add Trainer.validate(…) method to run one validation epoch #4948 fails with DDP because if you manually invoke thevalidatemethod, strange things happen with the dataloaders and the DDP checks fail afterwards. - Setting
trainer.num_sanity_val_stepsto-1so it runs the sanity check on the full validation dataset also fails. Because during the sanity checking the loggers are not properly set up. I tried various versions where I attempted to manually set the loggers before the sanity checking and even forcing them to log, but those also failed and became very unnecessarily hacky. - Tried temporarily setting
trainer.val_check_intervalto1to force the validation to happen at step 1 at least, but then setting it back to its original value did not take any effect and the trianer kept validating at every step.
I feel like this should be easier to do and maybe I'm missing something.
Thanks in advance.
Pitch
Running validation at step 0 is important for many finetuning pipelines, and I think it should be easier to run it robustly on DDP without having to hack many things around the trainer pipeline.
Alternatives
No response
Additional context
No response
mnovosad1095 and ModistAndrew