-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Optimization docs #6907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization docs #6907
Conversation
| def closure(): | ||
| # Only zero_grad on the first batch to accumulate gradients | ||
| is_first_batch_to_accumulate = batch_idx % 2 == 0 | ||
| if is_first_batch_to_accumulate: | ||
| opt.zero_grad() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this gradient accumulation in a closure here.
When using LBFGS-like optimizers which require a closure function to reevaluate the model, the closure always has to run zero_grad to make them work properly, so "gradient accumulation + closure" is not supported in PL as well as PyTorch, I think.
|
|
||
| ---------- | ||
|
|
||
| Using the closure functions for optimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this section ("Using the closure functions for optimization") because
- from 1.2.2, this isn't needed even when using LBFGS-like optimizers
- It doesn't include
zero_gradin the closure function, so this example is wrong anyway.
Borda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🐰
awaelchli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this doc update! thanks
carmocca
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! We really needed this
Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]>
docs/source/common/optimizers.rst
Outdated
| * ``self.optimizers()`` will return :class:`~pytorch_lightning.core.optimizer.LightningOptimizer` objects. You can | ||
| access your own optimizer with ``optimizer.optimizer``. However, if you use your own optimizer to perform a step, | ||
| Lightning won't be able to support accelerators and precision for you. | ||
| * Be careful where you call ``optimizer.zero_grad()``, or your model won't converge. | ||
| It is good practice to call ``optimizer.zero_grad()`` before ``self.manual_backward(loss)``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this now that we have it at the bottom?
| * ``self.optimizers()`` will return :class:`~pytorch_lightning.core.optimizer.LightningOptimizer` objects. You can | |
| access your own optimizer with ``optimizer.optimizer``. However, if you use your own optimizer to perform a step, | |
| Lightning won't be able to support accelerators and precision for you. | |
| * Be careful where you call ``optimizer.zero_grad()``, or your model won't converge. | |
| It is good practice to call ``optimizer.zero_grad()`` before ``self.manual_backward(loss)``. | |
| Be careful where you call ``optimizer.zero_grad()``, or your model won't converge. | |
| It is good practice to call ``optimizer.zero_grad()`` before ``self.manual_backward(loss)``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave it in both manual and automatic optimization sections as one might only look at a section of their interest, but the decision is yours. Should I apply your suggestion here?
| if batch_idx % 2 == 0 : | ||
| optimizer.step(closure=optimizer_closure) | ||
| optimizer.zero_grad() | ||
| optimizer.step(closure=optimizer_closure) | ||
| # update discriminator opt every 4 steps | ||
| # update discriminator opt every 2 steps | ||
| if optimizer_idx == 1: | ||
| if batch_idx % 4 == 0 : | ||
| if (batch_idx + 1) % 2 == 0 : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for the note, either optimizer needs to update its parameters every step. Otherwise, half of the training dataset will be ignored in this case...
| .. tip:: In manual mode we still automatically clip grads if Trainer(gradient_clip_val=x) is set | ||
| See :ref:`manual optimization<common/optimizers:Manual optimization>` for more examples. | ||
| .. tip:: In manual mode we still automatically accumulate grad over batches if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out by @rubencart, this tip is outdated.
| .. warning:: | ||
| * Before 1.3, Lightning automatically called ``lr_scheduler.step()`` in both automatic and manual optimization. From | ||
| 1.3, ``lr_scheduler.step()`` is now for the user to call at arbitrary intervals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manual lr_scheduler.step PR (#6825) couldn't make it to 1.3? If so, this needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion they should both make it in
cc @edenlightning
|
@edenlightning @carmocca are we set here? |
tchaton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
What does this PR do?
Fixes #<issue_number> Follow-up of #6825 (comment). Also requested by the community in #5780.
Updated docs links (23803e3)
Optimization page
Current: https://pytorch-lightning.readthedocs.io/en/latest/common/optimizers.html
Updated: https://108739-178626720-gh.circle-artifacts.com/0/html/common/optimizers.html
LightningModule page
Current: https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html
Updated: https://108739-178626720-gh.circle-artifacts.com/0/html/common/lightning_module.html
Motivation
In my opinion, the current docs about optimization (especially about manual optimization) are somewhat messy. There are quite a number of tip/note/warning which are randomly distributed. Also, I found a few outdated/wrong examples in the docs, so I'll try to address them here.
Description of the changes
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃