Change accelerator in backward to use DDP-wrapped model #4415

ananthsub · 2020-10-28T19:44:14Z

What does this PR do?

Even when gradient accumulation is enabled with DDP, we still see significant time spent in the backwards pass.
#4301 enables the no_sync when accumulating gradients. However, in the backward pass, we use the module inside of DDP for computing the backward. This circumvents the require_backward_grad_sync=False on the wrapped DDP model, so we miss out on the gradient accumulation speedups.

https://github.com/PyTorchLightning/pytorch-lightning/blob/41de4538aa0c187793709a93875e67666c2ddde8/pytorch_lightning/trainer/connectors/model_connector.py#L54-L57

https://github.com/PyTorchLightning/pytorch-lightning/blob/41de4538aa0c187793709a93875e67666c2ddde8/pytorch_lightning/accelerators/accelerator.py#L89-L101

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

williamFalcon · 2020-10-28T19:47:47Z

pytorch_lightning/accelerators/accelerator.py

        else:
            # do backward pass
-            model = self.trainer.get_model()
+            model = self.trainer.model


ummm... this will break the other accelerators no?
DP will be wrapped, and so will the ddp one?

The motivating factor is skipping parameter syncs in DDP (#4301). While accumulating gradients,

training_step_and_backward calls training_loop backward

training_loop backward calls accelerator backward here

accelerator backward reaches into the DP/DDP model and extracts the module inside and calls backward on that

which ignores the flags here:

How should we respect those settings in the backwards pass here?

Update accelerator.py

ab1e5dd

ananthsub requested review from Borda, SeanNaren, ananyahjha93, awaelchli, justusschock, nateraw, tchaton, teddykoker and williamFalcon as code owners October 28, 2020 19:44

ananthsub mentioned this pull request Oct 28, 2020

Skips DDP parameter sync #4301

Merged

8 tasks

williamFalcon reviewed Oct 28, 2020

View reviewed changes

ananthsub closed this Oct 28, 2020

edenlightning added the hacktoberfest-accepted label Oct 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change accelerator in backward to use DDP-wrapped model #4415

Change accelerator in backward to use DDP-wrapped model #4415

Uh oh!

ananthsub commented Oct 28, 2020 •

edited

Loading

Uh oh!

williamFalcon Oct 28, 2020

Uh oh!

ananthsub Oct 28, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Change accelerator in backward to use DDP-wrapped model #4415

Change accelerator in backward to use DDP-wrapped model #4415

Uh oh!

Conversation

ananthsub commented Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

PR review

Did you have fun?

Uh oh!

williamFalcon Oct 28, 2020

Choose a reason for hiding this comment

Uh oh!

ananthsub Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ananthsub commented Oct 28, 2020 •

edited

Loading

ananthsub Oct 28, 2020 •

edited

Loading