Skip to content

Role of internal hpc_load & hpc_save #4381

@tarepan

Description

@tarepan

❓ Questions and Help

What is your question?

What is role/responsibility of hpc_load & hpc_save?
Is it same with that of restore?

Motivation

pl has two internal way of save/load, restore way & hpc_load/hpc_save way.
They do similar dump/loading, and has different checkpoint selection mechanism.
If these two method have same role/responsibility in the sense of dump/loading, we can refactor them with common dump/loading code.
These is already some disparity (#1947), it could be potential bug source.
The motivation of this question is understanding role/responsibility for refactoring.

What have you tried?

Survey public documents and internal codes.

hpc_save

No public API (search result in docs), only used in internal SLURMConnector (search result in repo)
https://github.com/PyTorchLightning/pytorch-lightning/blob/66e58f5afb6ae8702b29ada52f7b022bbf201f9e/pytorch_lightning/trainer/connectors/slurm_connector.py#L88

hpc_load

No public API (search result in docs), only used in internal CheckpointConnector.hpc_load (search result in repo)
https://github.com/PyTorchLightning/pytorch-lightning/blob/3abfec896212ea85e45d6ac3ccb323ef242d16de/pytorch_lightning/trainer/connectors/checkpoint_connector.py#L202

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions