Skip to content

[RFC] Deprecate should_rank_save_checkpoint #9074

@ananthsub

Description

@ananthsub

Proposed refactoring or deprecation

Now that the checkpoint is better consolidated in the training type plugin, we no longer need this property, as this becomes an internal implementation detail of the training type.

Users (via the checkpoint callback) only need to call trainer.save_checkpoint and assume the plugin will handle checks surrounding this for them

Motivation

  • Ensure consolidation of saving logic in one place (e.g. all fsspec code for checkpoint paths shoud sit in one place vs. being scattered around the codebase)
  • API simpliication: fewer properties exposed on the Trainer

Pitch

Deprecate this property: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/trainer/properties.py#L117-L120

Deprecate this property: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/plugins/training_type/training_type_plugin.py#L313-L316

Move the directory creation from here: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/callbacks/model_checkpoint.py#L511-L514

Into here: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/plugins/io/torch_plugin.py#L30-L41

@SeanNaren - this is where we should also have a rm_checkpoint on the Checkpoint IO plugin such that we can deprecate this from the model checkpoint: https://github.com/PyTorchLightning/pytorch-lightning/blob/538e743f17c7da4624c902f762922e2837661818/pytorch_lightning/callbacks/model_checkpoint.py#L502-L505

One thing I'm not sure about: because the should_rank_save_checkpoint is exposed from the accelerator to the trainer as part of the public trainer API, does this need to go through a full deprecation cycle? Or is a breaking change as part of the plugins API permissible?

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions