-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Allow training type plugin to delay optimizer creation (FSDP 2/n) #6331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6331 +/- ##
========================================
+ Coverage 46% 91% +45%
========================================
Files 167 160 -7
Lines 10722 11407 +685
========================================
+ Hits 4905 10377 +5472
+ Misses 5817 1030 -4787 |
| """ | ||
| self.connect_training_type_plugin(self.training_type_plugin, model) | ||
| self.setup_optimizers(trainer) | ||
| if not self.training_type_plugin.setup_optimizers_after_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @SeanNaren
- Won't it be better to move
self.setup_optimizers(trainer)in dispatch directly or there will be a blocking part ? - If we stick with
setup_optimizers_after_dispatch, it would be more clear to havesetup_optimizers_in_pre_dispatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never thought about it honestly, that's a really good point lol. Let me just move it and see what happens
EDIT: The only edge case I see is that Sharded Training (not FSDP) requires re-configuring the optimizers. That might require a little bit of refactor. Since there are not exposed hooks to the user at this stage, I think we're fine to do so. Will make the change and see if it works out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #6506 I understand this piece of code a bit better.
Because Apex wraps optimizers, the optimizers have to be set before setting up the precision plugin.
So I'll update the doc strings making it clear that if you enable setup_optimizers_in_pre_dispatch that this may break APEX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total n00b question: is it possible to delay setting up both the optimizers and precision plugin to pre-dispatch to make apex work?
# Conflicts: # pytorch_lightning/accelerators/accelerator.py
tchaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
What does this PR do?
Allows the training type plugin to delay optimizer creation. This is useful when the optimizers can only be created after the model has been wrapped. In the case of FSDP, after the model has been sharded onto devices.
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃