Skip to content

Conversation

@akihironitta
Copy link
Contributor

@akihironitta akihironitta commented Apr 8, 2021

What does this PR do?

Fixes #<issue_number> Follow-up of #6825 (comment). Also requested by the community in #5780.


Updated docs links (23803e3)

Optimization page
Current: https://pytorch-lightning.readthedocs.io/en/latest/common/optimizers.html
Updated: https://108739-178626720-gh.circle-artifacts.com/0/html/common/optimizers.html

LightningModule page
Current: https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html
Updated: https://108739-178626720-gh.circle-artifacts.com/0/html/common/lightning_module.html


Motivation

In my opinion, the current docs about optimization (especially about manual optimization) are somewhat messy. There are quite a number of tip/note/warning which are randomly distributed. Also, I found a few outdated/wrong examples in the docs, so I'll try to address them here.


Description of the changes

  • Restructure the optimization page
  • Verify that examples are correct
  • Other small fixes/improvements of typo, indent, ...

Before submitting

  • [n/a] Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • [n/a] Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • [n/a] Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@akihironitta akihironitta added feature Is an improvement or enhancement docs Documentation related labels Apr 8, 2021
@akihironitta akihironitta added this to the 1.3 milestone Apr 8, 2021
Comment on lines -79 to -83
def closure():
# Only zero_grad on the first batch to accumulate gradients
is_first_batch_to_accumulate = batch_idx % 2 == 0
if is_first_batch_to_accumulate:
opt.zero_grad()
Copy link
Contributor Author

@akihironitta akihironitta Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this gradient accumulation in a closure here.

When using LBFGS-like optimizers which require a closure function to reevaluate the model, the closure always has to run zero_grad to make them work properly, so "gradient accumulation + closure" is not supported in PL as well as PyTorch, I think.


----------

Using the closure functions for optimization
Copy link
Contributor Author

@akihironitta akihironitta Apr 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this section ("Using the closure functions for optimization") because

  1. from 1.2.2, this isn't needed even when using LBFGS-like optimizers
  2. It doesn't include zero_grad in the closure function, so this example is wrong anyway.

@akihironitta akihironitta changed the title Update optimization docs [WIP] [RFC][WIP] Optimization docs Apr 11, 2021
@akihironitta akihironitta changed the title [RFC][WIP] Optimization docs Optimization docs Apr 13, 2021
@akihironitta akihironitta marked this pull request as ready for review April 13, 2021 21:09
Copy link
Collaborator

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🐰

@Borda Borda requested review from a team, SeanNaren and carmocca April 13, 2021 21:14
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this doc update! thanks

Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! We really needed this

akihironitta and others added 2 commits April 15, 2021 11:31
@akihironitta akihironitta requested a review from carmocca April 15, 2021 02:47
Comment on lines 62 to 66
* ``self.optimizers()`` will return :class:`~pytorch_lightning.core.optimizer.LightningOptimizer` objects. You can
access your own optimizer with ``optimizer.optimizer``. However, if you use your own optimizer to perform a step,
Lightning won't be able to support accelerators and precision for you.
* Be careful where you call ``optimizer.zero_grad()``, or your model won't converge.
It is good practice to call ``optimizer.zero_grad()`` before ``self.manual_backward(loss)``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this now that we have it at the bottom?

Suggested change
* ``self.optimizers()`` will return :class:`~pytorch_lightning.core.optimizer.LightningOptimizer` objects. You can
access your own optimizer with ``optimizer.optimizer``. However, if you use your own optimizer to perform a step,
Lightning won't be able to support accelerators and precision for you.
* Be careful where you call ``optimizer.zero_grad()``, or your model won't converge.
It is good practice to call ``optimizer.zero_grad()`` before ``self.manual_backward(loss)``.
Be careful where you call ``optimizer.zero_grad()``, or your model won't converge.
It is good practice to call ``optimizer.zero_grad()`` before ``self.manual_backward(loss)``.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave it in both manual and automatic optimization sections as one might only look at a section of their interest, but the decision is yours. Should I apply your suggestion here?

@akihironitta akihironitta marked this pull request as draft April 16, 2021 13:48
Comment on lines -1364 to +1361
if batch_idx % 2 == 0 :
optimizer.step(closure=optimizer_closure)
optimizer.zero_grad()
optimizer.step(closure=optimizer_closure)
# update discriminator opt every 4 steps
# update discriminator opt every 2 steps
if optimizer_idx == 1:
if batch_idx % 4 == 0 :
if (batch_idx + 1) % 2 == 0 :
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the note, either optimizer needs to update its parameters every step. Otherwise, half of the training dataset will be ignored in this case...

.. tip:: In manual mode we still automatically clip grads if Trainer(gradient_clip_val=x) is set
See :ref:`manual optimization<common/optimizers:Manual optimization>` for more examples.
.. tip:: In manual mode we still automatically accumulate grad over batches if
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out by @rubencart, this tip is outdated.

Comment on lines +173 to +175
.. warning::
* Before 1.3, Lightning automatically called ``lr_scheduler.step()`` in both automatic and manual optimization. From
1.3, ``lr_scheduler.step()`` is now for the user to call at arbitrary intervals.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual lr_scheduler.step PR (#6825) couldn't make it to 1.3? If so, this needs to be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion they should both make it in
cc @edenlightning

@Borda Borda added the ready PRs ready to be merged label Apr 18, 2021
@Borda
Copy link
Collaborator

Borda commented Apr 18, 2021

@edenlightning @carmocca are we set here?

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@tchaton tchaton enabled auto-merge (squash) April 19, 2021 14:07
@tchaton tchaton added the _Will label Apr 19, 2021
@lexierule lexierule disabled auto-merge April 19, 2021 14:08
@lexierule lexierule merged commit d1529c2 into master Apr 19, 2021
@lexierule lexierule deleted the docs/clean-optimization branch April 19, 2021 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation related feature Is an improvement or enhancement ready PRs ready to be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants