Skip to content

ModelCheckpoint save_function() not set? #4079

@celsofranssa

Description

@celsofranssa

I am training a PL model using the following code snippet:

    # logger
    tb_logger = pl_loggers.TensorBoardLogger(cfg.logs.path, name='rnn_exp')

    # checkpoint callback
    checkpoint_callback = ModelCheckpoint(
        filepath=cfg.checkpoint.path + "encoder_rnn{epoch:02d}",
        save_top_k=1,
        mode="min" # monitor is defined in val_step: EvalResult(checkpoint_on=val_loss)
    )

    # early stopping callback
    early_stopping_callback = EarlyStopping(
        monitor="val_loss",
        patience=cfg.val.patience,
        mode="min"
    )

    tokenizer = ...
    dm = MyDataModule(cfg, tokenizer)

    model = RNNEncoder(cfg)

    trainer = Trainer(
        fast_dev_run=False,
        max_epochs=cfg.train.max_epochs,
        gpus=1,
        logger=tb_logger,
        callbacks=[checkpoint_callback, early_stopping_callback]
    )

    # training
    dm.setup('fit')
    trainer.fit(model, datamodule=dm)

However, after the first epoch, the model presents the following error, probably when calling the model checkpoint callback:

    trainer.fit(model, datamodule=dm)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
    result = fn(self, *args, **kwargs)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1073, in fit
    results = self.accelerator_backend.train(model)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_backend.py", line 51, in train
    results = self.trainer.run_pretrain_routine(model)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1239, in run_pretrain_routine
    self.train()
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 394, in train
    self.run_training_epoch()
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 516, in run_training_epoch
    self.run_evaluation(test_mode=False)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 603, in run_evaluation
    self.on_validation_end()
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 176, in on_validation_end
    callback.on_validation_end(self, self.get_model())
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 27, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 380, in on_validation_end
    self._do_check_save(filepath, current, epoch, trainer, pl_module)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 421, in _do_check_save
    self._save_model(filepath, trainer, pl_module)
  File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 212, in _save_model
    raise ValueError(".save_function() not set")
ValueError: .save_function() not set

Could you tell me if I forgot to configure something?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions