Call `optimizer.zero_grad()` before backward inside closure in AutoOpt #6147

akihironitta · 2021-02-23T06:41:01Z

What does this PR do?

Fixes #4083
Fixes #5545
To-check #6134

Description of the changes

Makes sure to call zero_grad inside the closure function (TrainerLoop.training_step_and_backward()).

Note that this positions the zero_grad call before backward, as generally suggested throughout PyTorch's docs.

Reported that LBFGS doesn't work In #4083, we then found that the number of times zero_grad is actually called is different between Lightning and pure PyTorch:

Lightning calls closure 20 times and zero_grad only 1 time while
PyTorch calls closure 20 times and zero_grad 20 times where 20 is the value of torch.optim.LBFGS(..., max_iter=20). (because obviously closure calls zero_grad inside. See the sample scripts below)

As mentioned in the PyTorch docs, the closure should call zero_grad, but the current Lightning calls it outside the closure not inside, and thus it's not working properly when using optimizers which need re-evaluation of the loss in optimizer.step(closure).

The closure should clear the gradients, compute the loss, and return it.

TODO

Call zero_grad in closure (and remove zero_grad calls outside the closure)
Update docs to reflect the new zero_grad position
~~Update docs to recommend using manual optimization when using a similar optimizer to torch.optim.LBFGS which needs reevaluation of the loss via closure.~~
~~Ensure that scheduler.step is called the same number of times as optimizer.step in manual optimization~~ I'll disable scheduler.step in manual optimization in another PR. cc: @carmocca

Here are the minimal code examples using BoringModel.

Lightning code

import pytorch_lightning as pl
import torch
from torch.utils.data import DataLoader, Dataset
pl.seed_everything(42)

class RandomDataset(Dataset):
    def __init__(self, size, num_samples):
        self.len = num_samples
        self.data = torch.randn(num_samples, size)
    def __getitem__(self, index):
        return self.data[index]
    def __len__(self):
        return self.len

class BoringModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)
    def forward(self, x):
        return self.layer(x)
    def loss(self, batch, prediction):
        return torch.nn.functional.mse_loss(prediction, torch.ones_like(prediction))
    def training_step(self, batch, batch_idx):
        output = self.layer(batch)
        loss = self.loss(batch, output)
        return {"loss": loss}
    def training_step_end(self, training_step_outputs):
        loss = training_step_outputs["loss"]
        print("loss:", loss.item())
        return training_step_outputs
    def configure_optimizers(self):
        # optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        optimizer = torch.optim.LBFGS(self.parameters(), lr=0.01, max_iter=20)
        return optimizer

def main():
    ds = RandomDataset(32, 100000)
    dl = DataLoader(ds, batch_size=1024)
    model = BoringModel()
    trainer = pl.Trainer(
        progress_bar_refresh_rate=0,
        fast_dev_run=1,
    )
    trainer.fit(model, dl)

if __name__ == "__main__":
    main()

Pure PyTorch code

import torch
import torch.nn as nn
from pytorch_lightning import seed_everything
from torch.utils.data import DataLoader, Dataset
seed_everything(42)

class RandomDataset(Dataset):
    def __init__(self, size, num_samples):
        self.len = num_samples
        self.data = torch.randn(num_samples, size)
    def __getitem__(self, index):
        return self.data[index]
    def __len__(self):
        return self.len

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.layer = torch.nn.Linear(32, 2)
    def forward(self, x):
        return self.layer(x)

def main():
    ds = RandomDataset(32, 100000)
    dl = DataLoader(ds, batch_size=1024)
    model = Model()
    # optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    optimizer = torch.optim.LBFGS(model.parameters(), lr=0.01, max_iter=20)
    for epoch in range(3):
        for i, x in enumerate(dl):
            def closure():
                prediction = model(x)
                loss = torch.nn.functional.mse_loss(prediction, torch.ones_like(prediction))
                optimizer.zero_grad()  # removing this line causes the same bug as in Lightning script
                loss.backward()
                print("loss:", loss.item())
                return loss
            loss_out = optimizer.step(closure=closure)

if __name__ == '__main__':
    main()

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc: @carmocca @tchaton

codecov · 2021-02-23T06:42:56Z

Codecov Report

Merging #6147 (9836bb7) into master (40d5a9d) will decrease coverage by 3%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #6147    +/-   ##
=======================================
- Coverage      93%     90%    -3%     
=======================================
  Files         159     159            
  Lines       11380   11543   +163     
=======================================
- Hits        10624   10415   -209     
- Misses        756    1128   +372

docs/source/common/optimizers.rst

…ightning into bugfix/4083_lbfgs

CHANGELOG.md

docs/source/common/optimizers.rst

Lightning-AI#6147) Co-authored-by: Carlos Mocholi <[email protected]>

#6147) Co-authored-by: Carlos Mocholi <[email protected]>

Call zero_grad inside closure

a6da028

Call zero_grad inside closure independently of optim

57840f9

carmocca mentioned this pull request Feb 23, 2021

1.2.x cherries 🍒 #6083

Closed

carmocca assigned carmocca and akihironitta Feb 23, 2021

akihironitta added 4 commits February 24, 2021 17:24

Remove optimizer.zero_grad from optimizer.step

8d1253d

Use accelerator's zero_grad

eaf42fd

Update manual optimization docs

00376fc

Update automatic optimization docs

be43077

akihironitta commented Feb 24, 2021

View reviewed changes

docs/source/common/optimizers.rst Show resolved Hide resolved

docs/source/common/optimizers.rst Outdated Show resolved Hide resolved

Merge branch 'master' into bugfix/4083_lbfgs

77d7f83

carmocca mentioned this pull request Feb 24, 2021

Softmax GAN example does not produce good looking digits #6134

Closed

akihironitta added 6 commits February 24, 2021 23:32

Update new-project docs

3f6e086

Move on_before_zero_grad to trainloop

c422ffa

Use trainerloop methods

95e0a0b

Merge branch 'bugfix/4083_lbfgs' of github.com:akihironitta/pytorch-l…

e852efd

…ightning into bugfix/4083_lbfgs

Remove zero_grad after backward

8243b80

Split tests to step and zero_grad

8269f45

justusschock approved these changes Feb 25, 2021

View reviewed changes

akihironitta added 7 commits February 25, 2021 18:33

Call zero_grad before backward in tests

4694e3e

Call zero_grad before backward in tests

d658920

Add a test for optimization with lbfgs

2dd6154

Remove unused model

2fe4d28

Add back BoringModel

1845a67

Update CHANGELOG

eb875b4

zero_grad when the first batch of accumulation

d898e7b

akihironitta changed the title ~~Call optimizer.zero_grad() inside closure~~ Call optimizer.zero_grad() before backward inside closure Feb 25, 2021

akihironitta changed the title ~~Call optimizer.zero_grad() before backward inside closure~~ Call optimizer.zero_grad() before backward inside closure in AutoOpt Feb 25, 2021

Merge branch 'master' into bugfix/4083_lbfgs

d262546

Update the test

87fdfb8

akihironitta marked this pull request as ready for review February 27, 2021 13:20

Remove unused import

9d11f34

carmocca approved these changes Feb 27, 2021

View reviewed changes

Update test comment

93d3f57

awaelchli reviewed Feb 27, 2021

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

docs/source/common/optimizers.rst Outdated Show resolved Hide resolved

akihironitta added 2 commits February 28, 2021 13:46

Update CHANGELOG.md

05c28d5

Update docs

9836bb7

carmocca merged commit 925f082 into Lightning-AI:master Mar 1, 2021

akihironitta deleted the bugfix/4083_lbfgs branch March 1, 2021 14:16

tchaton added the bug Something isn't working label Mar 2, 2021

tchaton added this to the 1.2.x milestone Mar 2, 2021

kaushikb11 pushed a commit to kaushikb11/pytorch-lightning that referenced this pull request Mar 2, 2021

Call optimizer.zero_grad() before backward inside closure in AutoOpt (

8ce2178

Lightning-AI#6147) Co-authored-by: Carlos Mocholi <[email protected]>

kaushikb11 pushed a commit to kaushikb11/pytorch-lightning that referenced this pull request Mar 2, 2021

Call optimizer.zero_grad() before backward inside closure in AutoOpt (

0138f0a

Lightning-AI#6147) Co-authored-by: Carlos Mocholi <[email protected]>

lexierule pushed a commit that referenced this pull request Mar 5, 2021

Call optimizer.zero_grad() before backward inside closure in AutoOpt (

3c498ce

#6147) Co-authored-by: Carlos Mocholi <[email protected]>

carmocca mentioned this pull request Mar 7, 2021

[doc] Fix closure in manual optimization #6374

Merged

11 tasks

akihironitta mentioned this pull request Mar 11, 2021

[doc] Update the order of zero_grad and backward #6478

Merged

11 tasks

awaelchli mentioned this pull request Mar 25, 2021

on_before_zero_grad called before on_after_backward #6665

Closed

carmocca mentioned this pull request Oct 4, 2023

Should the model's grads be cleared before entering the validation loop? #18713

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Call `optimizer.zero_grad()` before backward inside closure in AutoOpt #6147

Call `optimizer.zero_grad()` before backward inside closure in AutoOpt #6147

Uh oh!

akihironitta commented Feb 23, 2021 •

edited

Loading

Uh oh!

codecov bot commented Feb 23, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Call optimizer.zero_grad() before backward inside closure in AutoOpt #6147

Call optimizer.zero_grad() before backward inside closure in AutoOpt #6147

Uh oh!

Conversation

akihironitta commented Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Description of the changes

TODO

Before submitting

PR review

Did you have fun?

Uh oh!

codecov bot commented Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Call `optimizer.zero_grad()` before backward inside closure in AutoOpt #6147

Call `optimizer.zero_grad()` before backward inside closure in AutoOpt #6147

akihironitta commented Feb 23, 2021 •

edited

Loading

codecov bot commented Feb 23, 2021 •

edited

Loading