-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
questionFurther information is requestedFurther information is requestedwon't fixThis will not be worked onThis will not be worked on
Description
This is the callback in trainer()
trainer = pl.Trainer(
callbacks=[ModelCheckpoint(monitor='val_loss',
filepath=os.path.join(hparams.default_root_dir,
'{epoch}-{val_loss:.2f}-{test_acc:.2f}'), verbose=True) ],
But the app crashes on the first epoch on the following error
Exception has occurred: ValueError
.save_function() not set
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 133, in _save_model
raise ValueError(".save_function() not set")
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 240, in _do_check_save
self._save_model(filepath)
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 208, in on_validation_end
self._do_check_save(filepath, current, epoch)
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 63, in on_validation_end
callback.on_validation_end(self, self.get_model())
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 792, in call_checkpoint_callback
self.on_validation_end()
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 477, in run_training_epoch
self.call_checkpoint_callback()
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 363, in train
self.run_training_epoch()
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 865, in run_pretrain_routine
self.train()
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 477, in single_gpu_train
self.run_pretrain_routine(model)
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 705, in fit
self.single_gpu_train(model)
File "/home/AAA/PycharmProjects/DL2020LiorWolf/train.py", line 110, in main_train
trainer.fit(model)
File "/home/AAA/PycharmProjects/DL2020LiorWolf/train.py", line 40, in main
main_train(model_class_pointer, hyperparams, logger)
File "/home/AAA/PycharmProjects/DL2020LiorWolf/train.py", line 118, in <module>
main()
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
From the docs the model_checkpoint module seems as a "plug-and-play", I need to implement something else?
Actually, going through the source code, it seems as save_function is never set
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requestedwon't fixThis will not be worked onThis will not be worked on