-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on
Description
I am training a PL model using the following code snippet:
# logger
tb_logger = pl_loggers.TensorBoardLogger(cfg.logs.path, name='rnn_exp')
# checkpoint callback
checkpoint_callback = ModelCheckpoint(
filepath=cfg.checkpoint.path + "encoder_rnn{epoch:02d}",
save_top_k=1,
mode="min" # monitor is defined in val_step: EvalResult(checkpoint_on=val_loss)
)
# early stopping callback
early_stopping_callback = EarlyStopping(
monitor="val_loss",
patience=cfg.val.patience,
mode="min"
)
tokenizer = ...
dm = MyDataModule(cfg, tokenizer)
model = RNNEncoder(cfg)
trainer = Trainer(
fast_dev_run=False,
max_epochs=cfg.train.max_epochs,
gpus=1,
logger=tb_logger,
callbacks=[checkpoint_callback, early_stopping_callback]
)
# training
dm.setup('fit')
trainer.fit(model, datamodule=dm)However, after the first epoch, the model presents the following error, probably when calling the model checkpoint callback:
trainer.fit(model, datamodule=dm)
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1073, in fit
results = self.accelerator_backend.train(model)
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/accelerators/gpu_backend.py", line 51, in train
results = self.trainer.run_pretrain_routine(model)
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1239, in run_pretrain_routine
self.train()
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 394, in train
self.run_training_epoch()
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 516, in run_training_epoch
self.run_evaluation(test_mode=False)
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 603, in run_evaluation
self.on_validation_end()
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 176, in on_validation_end
callback.on_validation_end(self, self.get_model())
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/utilities/distributed.py", line 27, in wrapped_fn
return fn(*args, **kwargs)
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 380, in on_validation_end
self._do_check_save(filepath, current, epoch, trainer, pl_module)
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 421, in _do_check_save
self._save_model(filepath, trainer, pl_module)
File "/home/celso/projects/venvs/semantic_code_search/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 212, in _save_model
raise ValueError(".save_function() not set")
ValueError: .save_function() not setCould you tell me if I forgot to configure something?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on