-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
❓ Questions and Help
What is your question?
What is role/responsibility of hpc_load & hpc_save?
Is it same with that of restore?
Motivation
pl has two internal way of save/load, restore way & hpc_load/hpc_save way.
They do similar dump/loading, and has different checkpoint selection mechanism.
If these two method have same role/responsibility in the sense of dump/loading, we can refactor them with common dump/loading code.
These is already some disparity (#1947), it could be potential bug source.
The motivation of this question is understanding role/responsibility for refactoring.
What have you tried?
Survey public documents and internal codes.
hpc_save
No public API (search result in docs), only used in internal SLURMConnector (search result in repo)
https://github.com/PyTorchLightning/pytorch-lightning/blob/66e58f5afb6ae8702b29ada52f7b022bbf201f9e/pytorch_lightning/trainer/connectors/slurm_connector.py#L88
hpc_load
No public API (search result in docs), only used in internal CheckpointConnector.hpc_load (search result in repo)
https://github.com/PyTorchLightning/pytorch-lightning/blob/3abfec896212ea85e45d6ac3ccb323ef242d16de/pytorch_lightning/trainer/connectors/checkpoint_connector.py#L202