-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
featureIs an improvement or enhancementIs an improvement or enhancementhelp wantedOpen to be worked onOpen to be worked on
Milestone
Description
🚀 Feature
When the ckpt_path is passed to the test/validation/predict functions of the Trainer, they load the weights even if a model is provided.
Motivation
I noticed that one of our DeepSpeed test was incorrect (see here). resume_from_checkpoint does not re-load the weights for test/validate/predict, which is probably the right thing to do, however when modified to pass ckpt_path to the test function I noticed the weights are not loaded, which is default behaviour.
As described by @carmocca I suggested we change the behaviour as such:
BEFORE
trainer.test(model, ckpt_path=None) # use provided model
trainer.test(model, ckpt_path='best') # use provided model, ignore ckpt_path
trainer.test(model, ckpt_path='my_path') # use provided model, ignore ckpt_path
trainer.fit(model)
# then
trainer.test(ckpt_path=None) # use latest model
trainer.test(ckpt_path='my_path') # load path
AFTER
trainer.test(model, ckpt_path=None) # use provided model
trainer.test(model, ckpt_path='best') # load best model
trainer.test(model, ckpt_path='my_path') # load path
trainer.fit(model)
# then
trainer.test(ckpt_path=None) # load best model
trainer.test(ckpt_path='my_path') # load path
This imo makes the behaviour in line with what's expected + allows deepspeed to be used as an engine in the cases where inference cannot happen without the Trainer (when there is sharding orchestration etc).
carmocca
Metadata
Metadata
Assignees
Labels
featureIs an improvement or enhancementIs an improvement or enhancementhelp wantedOpen to be worked onOpen to be worked on