Skip to content

save_function() not set with save_model callback? #1524

@dvirginz

Description

@dvirginz

This is the callback in trainer()

trainer = pl.Trainer(
        callbacks=[ModelCheckpoint(monitor='val_loss',
filepath=os.path.join(hparams.default_root_dir,
'{epoch}-{val_loss:.2f}-{test_acc:.2f}'), verbose=True) ],

But the app crashes on the first epoch on the following error

Exception has occurred: ValueError
.save_function() not set
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 133, in _save_model
    raise ValueError(".save_function() not set")
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 240, in _do_check_save
    self._save_model(filepath)
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 208, in on_validation_end
    self._do_check_save(filepath, current, epoch)
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 63, in on_validation_end
    callback.on_validation_end(self, self.get_model())
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 792, in call_checkpoint_callback
    self.on_validation_end()
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 477, in run_training_epoch
    self.call_checkpoint_callback()
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 363, in train
    self.run_training_epoch()
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 865, in run_pretrain_routine
    self.train()
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 477, in single_gpu_train
    self.run_pretrain_routine(model)
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 705, in fit
    self.single_gpu_train(model)
  File "/home/AAA/PycharmProjects/DL2020LiorWolf/train.py", line 110, in main_train
    trainer.fit(model)
  File "/home/AAA/PycharmProjects/DL2020LiorWolf/train.py", line 40, in main
    main_train(model_class_pointer, hyperparams, logger)
  File "/home/AAA/PycharmProjects/DL2020LiorWolf/train.py", line 118, in <module>
    main()
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/AAA/anaconda3/envs/BBB/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)

From the docs the model_checkpoint module seems as a "plug-and-play", I need to implement something else?
Actually, going through the source code, it seems as save_function is never set

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requestedwon't fixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions