Skip to content

Conversation

@SeanNaren
Copy link
Contributor

@SeanNaren SeanNaren commented Nov 14, 2020

What does this PR do?

Ties to #4178

Allows custom ddp plugin access to saving optimizer state. This is required for optimizer state sharding as we need to consolidate the optimizer state from all processes before saving.

cc @ananthsub @awaelchli

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In in short, see following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified; Bugfixes should be including in bug-fix release milestones (m.f.X) and features should be included in (m.X.b) releases.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Nov 14, 2020

Codecov Report

Merging #4675 (d72144b) into master (8283680) will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #4675   +/-   ##
======================================
  Coverage      93%     93%           
======================================
  Files         117     117           
  Lines        8941    8949    +8     
======================================
+ Hits         8311    8319    +8     
  Misses        630     630           

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@SeanNaren SeanNaren added this to the 1.1 milestone Nov 14, 2020
@SeanNaren SeanNaren changed the title Sharded Accelerator 2/n: Allow ddp plugin to modify optimizer state saving Sharded Plugin 2/n: Allow ddp plugin to modify optimizer state saving Nov 15, 2020
@SeanNaren SeanNaren added the distributed Generic distributed-related topic label Nov 15, 2020
@Borda Borda added the feature Is an improvement or enhancement label Nov 18, 2020
Copy link
Collaborator

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@williamFalcon williamFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dope

# Conflicts:
#	pytorch_lightning/plugins/ddp_plugin.py
@SeanNaren SeanNaren merged commit e7134a9 into master Nov 18, 2020
@SeanNaren SeanNaren deleted the feature/817-fairscale-2n branch November 18, 2020 16:38
rohitgr7 pushed a commit that referenced this pull request Nov 21, 2020
…#4675)

* Allow ddp plugin to modify optimizer state saving

* Rely on the accelerator for optimizer states

* Ensure we init the accelerator for the saving function

* Better comment for optim state dump

* Revert "Ensure we init the accelerator for the saving function"

This reverts commit af65eff

* Added accelerator check to initialize tuner before saving model checkpoint

* Simplify comment

* Revert "Added accelerator check to initialize tuner before saving model checkpoint"

This reverts commit f9929c0

* Return single optimizer state to reduce duplication

* Fixed docstring

* Fixed typing

* Fixed comment

* Added CHANGELOG.md

Co-authored-by: chaton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

distributed Generic distributed-related topic feature Is an improvement or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants