Skip to content

After trainer resume_from_checkpoint checkpoint_callback doesn't restore best metric value #4386

@Vozf

Description

@Vozf

After I restore training with

    pl_model = LightningModel.load_from_checkpoint(str(ckpt_path))

    trainer = Trainer(
        resume_from_checkpoint=str(ckpt_path.name),
        logger=instantiate(cfg.logger, experiment_key=cfg.experiment_id),
        checkpoint_callback=instantiate(cfg.checkpoint.model_checkpoint),
        callbacks=[instantiate(callback) for callback in cfg.callbacks],
    )
    trainer.fit(pl_model)

instantiate from facebook/hydra
The training process doesnt restore the best metric value and starts from scratch so I'm getting on the first restored epoch

INFO:lightning:Epoch 1129: val_mae reached 0.10418 (best 0.10418), saving model to

When there was a better checkpoint before interruption

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingduplicateThis issue or pull request already existshelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions