Skip to content

Conversation

@SeanNaren
Copy link
Contributor

@SeanNaren SeanNaren commented Mar 3, 2021

What does this PR do?

Allows the training type plugin to delay optimizer creation. This is useful when the optimizers can only be created after the model has been wrapped. In the case of FSDP, after the model has been sharded onto devices.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@SeanNaren SeanNaren added feature Is an improvement or enhancement distributed Generic distributed-related topic labels Mar 3, 2021
@SeanNaren SeanNaren added this to the 1.3 milestone Mar 3, 2021
@codecov
Copy link

codecov bot commented Mar 3, 2021

Codecov Report

Merging #6331 (0b6a659) into master (4e9b453) will increase coverage by 45%.
The diff coverage is 100%.

❗ Current head 0b6a659 differs from pull request most recent head 4b94920. Consider uploading reports for the commit 4b94920 to get more accurate results

@@           Coverage Diff            @@
##           master   #6331     +/-   ##
========================================
+ Coverage      46%     91%    +45%     
========================================
  Files         167     160      -7     
  Lines       10722   11407    +685     
========================================
+ Hits         4905   10377   +5472     
+ Misses       5817    1030   -4787     

"""
self.connect_training_type_plugin(self.training_type_plugin, model)
self.setup_optimizers(trainer)
if not self.training_type_plugin.setup_optimizers_after_dispatch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @SeanNaren

  1. Won't it be better to move self.setup_optimizers(trainer) in dispatch directly or there will be a blocking part ?
  2. If we stick with setup_optimizers_after_dispatch, it would be more clear to have setup_optimizers_in_pre_dispatch ?

Copy link
Contributor Author

@SeanNaren SeanNaren Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never thought about it honestly, that's a really good point lol. Let me just move it and see what happens

EDIT: The only edge case I see is that Sharded Training (not FSDP) requires re-configuring the optimizers. That might require a little bit of refactor. Since there are not exposed hooks to the user at this stage, I think we're fine to do so. Will make the change and see if it works out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #6506 I understand this piece of code a bit better.

Because Apex wraps optimizers, the optimizers have to be set before setting up the precision plugin.

So I'll update the doc strings making it clear that if you enable setup_optimizers_in_pre_dispatch that this may break APEX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total n00b question: is it possible to delay setting up both the optimizers and precision plugin to pre-dispatch to make apex work?

# Conflicts:
#	pytorch_lightning/accelerators/accelerator.py
@mergify mergify bot removed the has conflicts label Mar 18, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@SeanNaren SeanNaren enabled auto-merge (squash) March 22, 2021 11:20
@SeanNaren SeanNaren merged commit 58c9fa7 into master Mar 22, 2021
@SeanNaren SeanNaren deleted the feat/fsdp_2n branch March 22, 2021 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

distributed Generic distributed-related topic feature Is an improvement or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants