Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving #4675

SeanNaren · 2020-11-14T20:24:11Z

What does this PR do?

Ties to #4178

Allows custom ddp plugin access to saving optimizer state. This is required for optimizer state sharding as we need to consolidate the optimizer state from all processes before saving.

cc @ananthsub @awaelchli

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In in short, see following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2020-11-14T20:35:23Z

Codecov Report

Merging #4675 (d72144b) into master (8283680) will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #4675   +/-   ##
======================================
  Coverage      93%     93%           
======================================
  Files         117     117           
  Lines        8941    8949    +8     
======================================
+ Hits         8311    8319    +8     
  Misses        630     630

tchaton

LGTM !

pytorch_lightning/accelerators/accelerator.py

pytorch_lightning/trainer/connectors/checkpoint_connector.py

This reverts commit af65eff

…point

…el checkpoint" This reverts commit f9929c0

Borda

lgtm

williamFalcon

dope

# Conflicts: # pytorch_lightning/plugins/ddp_plugin.py

…#4675) * Allow ddp plugin to modify optimizer state saving * Rely on the accelerator for optimizer states * Ensure we init the accelerator for the saving function * Better comment for optim state dump * Revert "Ensure we init the accelerator for the saving function" This reverts commit af65eff * Added accelerator check to initialize tuner before saving model checkpoint * Simplify comment * Revert "Added accelerator check to initialize tuner before saving model checkpoint" This reverts commit f9929c0 * Return single optimizer state to reduce duplication * Fixed docstring * Fixed typing * Fixed comment * Added CHANGELOG.md Co-authored-by: chaton <[email protected]>

Allow ddp plugin to modify optimizer state saving

166237b

SeanNaren requested review from Borda, ananyahjha93, awaelchli, justusschock, nateraw, tchaton, teddykoker and williamFalcon as code owners November 14, 2020 20:24

tchaton approved these changes Nov 14, 2020

View reviewed changes

Merge branch 'master' into feature/817-fairscale-2n

6d285a8

ananthsub reviewed Nov 14, 2020

View reviewed changes

pytorch_lightning/accelerators/accelerator.py Outdated Show resolved Hide resolved

ananthsub reviewed Nov 14, 2020

View reviewed changes

pytorch_lightning/accelerators/accelerator.py Outdated Show resolved Hide resolved

ananthsub reviewed Nov 14, 2020

View reviewed changes

pytorch_lightning/trainer/connectors/checkpoint_connector.py Outdated Show resolved Hide resolved

Rely on the accelerator for optimizer states

acaa995

SeanNaren added this to the 1.1 milestone Nov 14, 2020

SeanNaren added 10 commits November 14, 2020 22:55

Ensure we init the accelerator for the saving function

af65eff

Better comment for optim state dump

5c8a7b4

Revert "Ensure we init the accelerator for the saving function"

9b82bfb

This reverts commit af65eff

Added accelerator check to initialize tuner before saving model check…

f9929c0

…point

Simplify comment

e669930

Revert "Added accelerator check to initialize tuner before saving mod…

18b0d74

…el checkpoint" This reverts commit f9929c0

Return single optimizer state to reduce duplication

09ef93e

Fixed docstring

14b54e2

Fixed typing

6be7caf

Fixed comment

098fc64

SeanNaren changed the title ~~Sharded Accelerator 2/n: Allow ddp plugin to modify optimizer state saving~~ Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving Nov 15, 2020

Added CHANGELOG.md

db91ec5

SeanNaren added the distributed Generic distributed-related topic label Nov 15, 2020

SeanNaren added 2 commits November 18, 2020 10:13

Merge branch 'master' into feature/817-fairscale-2n

3b90d6b

Merge branch 'master' into feature/817-fairscale-2n

369d8c7

justusschock approved these changes Nov 18, 2020

View reviewed changes

Borda added the feature Is an improvement or enhancement label Nov 18, 2020

Borda approved these changes Nov 18, 2020

View reviewed changes

SeanNaren added 2 commits November 18, 2020 13:20

Merge branch 'master' into feature/817-fairscale-2n

06b7e89

Merge branch 'master' into feature/817-fairscale-2n

5167a22

williamFalcon approved these changes Nov 18, 2020

View reviewed changes

Merge branch 'master' into feature/817-fairscale-2n

d72144b

# Conflicts: # pytorch_lightning/plugins/ddp_plugin.py

SeanNaren merged commit e7134a9 into master Nov 18, 2020

SeanNaren deleted the feature/817-fairscale-2n branch November 18, 2020 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving #4675

Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving #4675

Uh oh!

SeanNaren commented Nov 14, 2020 •

edited by Borda

Loading

Uh oh!

codecov bot commented Nov 14, 2020 •

edited

Loading

Uh oh!

tchaton left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Borda left a comment

Uh oh!

williamFalcon left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving #4675

Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving #4675

Uh oh!

Conversation

SeanNaren commented Nov 14, 2020 • edited by Borda Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Did you have fun?

Uh oh!

codecov bot commented Nov 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tchaton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Borda left a comment

Choose a reason for hiding this comment

Uh oh!

williamFalcon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

SeanNaren commented Nov 14, 2020 •

edited by Borda

Loading

codecov bot commented Nov 14, 2020 •

edited

Loading