add clip_grad_by_value feature #5477

dhkim0225 · 2021-01-12T05:56:47Z

What does this PR do?

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Check that target branch and milestone match!

Did you have fun?

Make sure you had fun coding 🙃

pep8speaks · 2021-01-12T05:56:55Z

Hello @dhkim0225! Thanks for updating this PR.

In the file pytorch_lightning/plugins/legacy/apex.py:

Line 107:121: E501 line too long (122 > 120 characters)

Comment last updated at 2021-01-29 16:27:27 UTC

codecov · 2021-01-12T07:51:41Z

Codecov Report

Merging #5477 (8a18422) into release/1.2-dev (c8f605e) will increase coverage by 4%.
The diff coverage is 43%.

@@               Coverage Diff                @@
##           release/1.2-dev   #5477    +/-   ##
================================================
+ Coverage               89%     93%    +4%     
================================================
  Files                  153     152     -1     
  Lines                10803   10757    -46     
================================================
+ Hits                  9610    9958   +348     
+ Misses                1193     799   -394

priancho

If we need to allow a user to use p-norm with any p value (not just l2-norm), I think that it is much more readable to use two separate flags, "gradient_clip_algorithm" and "gradient_clip_norm_type" than using one flag for two purposes.

priancho · 2021-01-12T11:47:03Z

docs/source/training_tricks.rst

-norm <https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_norm_>`_ computed over all model parameters together.
+Gradient clipping may be enabled to avoid exploding gradients. Also, you can choose various criterion by
+`gradient_clip_algorithm` option. For example, if `gradient_clip_algorithm == 'value'`, this will `clip the gradient
+by value <https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_value_>`_ computed over all model parameters.


The explanation of its behavior when gradient_clip_algorithm is 'value' is incorrect.
What about the following one?

Gradient clipping may be enabled to avoid exploding gradients. By default, this will clip the gradient norm <https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_norm_>_ computed over all model parameters together. If gradient_clip_algorithm option is set to 'value', which is 'norm' by default, this will clip the gradient value <https://pytorch.org/docs/stable/nn.html#torch.nn.utils.clip_grad_value_> for each parameter instead.

Thank you for comment ! I'll change it

priancho · 2021-01-12T11:49:15Z

pytorch_lightning/accelerators/tpu_accelerator.py

+                        optimizer: Optimizer,
+                        grad_clip_val: Union[float, int],
+                        gradient_clip_algorithm: str,
+                        norm_type: Union[float, int]):


Why did you remove the default value for norm_type?

I was hoping that norm_type default value is following trainer's default value.
And thought the default value of (local) function can confuse users.

priancho · 2021-01-12T11:55:24Z

pytorch_lightning/accelerators/tpu_accelerator.py

+        if gradient_clip_algorithm == 'value':
+            torch.nn.utils.clip_grad_value_(parameters, clip_value=grad_clip_val)
+        elif gradient_clip_algorithm.startswith('norm'):
+            max_norm = grad_clip_val


It seems like we can't use torch.nn.utils.clip_grad_norm_() method because episilon value added to the denominator during gradient scaling is hard-coded as "1e-6" in that method :-0

Yup.
However, since native amp plugin uses torch.nn.utils.clip_grad_norm_ (epsilon 1e-6), I wonder why.

priancho · 2021-01-12T12:07:29Z

pytorch_lightning/plugins/sharded_native_amp_plugin.py

+                       norm_type: Union[float, int]):
+        if gradient_clip_algorithm == 'value':
+            raise NotImplementedError("Value grad clipping with sharded ddp is not implemented yet")
+        elif gradient_clip_algorithm.startswith('norm'):


Following two lines are missing.

max_norm = grad_clip_val norm_type = float(2.0)

I think max_norm variable doesn't have to be reassigned. grad_clip_val and norm_type can be directly given as an argument

dhkim0225 · 2021-01-13T03:12:50Z

Thank you for reviewing my PR @priancho !
I'm totally agree with you.

If we need to allow a user to use p-norm with any p value (not just l2-norm), I think that it is much more readable to use two separate flags, "gradient_clip_algorithm" and "gradient_clip_norm_type" than using one flag for two purposes.

Since I'll be busy for the next couple of days, be working after this Saturday.

Release/1.2 dev

CHANGELOG.md

dhkim0225 · 2021-01-18T06:19:51Z

@priancho @tchaton updated based on pre-review.

dhkim0225 · 2021-01-18T08:46:16Z

remove conflict on changelog

tchaton

Would you mind adding some tests ?

tchaton · 2021-01-19T12:16:14Z

pytorch_lightning/accelerators/accelerator.py

        else:
            model = self.trainer.get_model()
-            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=grad_clip_val, norm_type=norm_type)
+            if clip_algorithm == 'value':


Let's use an LightningEnum there .

Thank you for review. I'll change my code.

tchaton · 2021-01-19T12:18:20Z

pytorch_lightning/accelerators/tpu_accelerator.py

-            p.grad.data.mul_(clip_coef.to(p.grad.data.device))
+        if gradient_clip_algorithm == 'value':
+            torch.nn.utils.clip_grad_value_(parameters, clip_value=grad_clip_val)
+        elif gradient_clip_algorithm.startswith('norm'):


Why are you using .startswith('norm') ?

It's totally my mistake. Will be changed.
The previous implementation use ['norm' + str(norm_type)] inputs like norm2 and norm3.

tchaton · 2021-01-19T12:23:07Z

pytorch_lightning/plugins/sharded_native_amp_plugin.py

+                       gradient_clip_algorithm: str,
+                       norm_type: Union[float, int]):
+        if gradient_clip_algorithm == 'value':
+            raise NotImplementedError("Value grad clipping with sharded ddp is not implemented yet")


Open an issue on FairScale to ask for them to support this feature. :)

facebookresearch/fairscale#308
Opened ... !! :)

See my comment above, the vanilla clip_grad_value should just work actually

…/pytorch-lightning into feature/clip_grad_by_value_1.2-dev

Borda

at the first glance looks good to me, @SeanNaren mind review?

priancho

There is a line of code that should be deleted.

pytorch_lightning/plugins/legacy/native_amp.py

bugfix

SeanNaren · 2021-02-02T15:25:20Z

I love this! great work on the PR :) we're doing a bit of an accelerator refactor and it might be better for these changes to end up in the new API, thoughts @tchaton @awaelchli @justusschock?

dhkim0225 · 2021-02-15T01:19:22Z

@SeanNaren
So.... it seems like there're a lot of changes.
Is there anything I can do for this PR?

tchaton · 2021-02-15T15:23:50Z

Hey @dhkim0225,

We merged some big refactor on accelerators. The simplest would be to rebase on master.
Best,
T.C

dhkim0225 · 2021-02-17T01:04:23Z

@tchaton Okay, I'll be work in this Friday

tchaton · 2021-02-22T09:38:27Z

Dear @dhkim0225 ,

Any updates. Do you need to help with rebasing ?
Or you could checkout from master and cherry-pick your commit.

Best,
T.C

dhkim0225 · 2021-02-22T09:44:00Z

@tchaton Sorry for the delay. I'm just started.
And as you said, opening a new PR with cherry-picking could be better.
Closing PR.

add clip_grad_by_value feature

4111955

dhkim0225 requested review from Borda, SeanNaren, awaelchli, justusschock, tchaton and williamFalcon as code owners January 12, 2021 05:56

write changelog, training_tricks.rst

c09e990

dhkim0225 requested a review from edenlightning as a code owner January 12, 2021 05:59

dhkim0225 added 3 commits January 12, 2021 15:01

add end line to sharded_natvie_amp_pluigin.py

baa9a49

bugfix update regex

a236611

update regex documentation

749e286

dhkim0225 mentioned this pull request Jan 12, 2021

[feat][OSS] Add clip_grad_value_ function. facebookresearch/fairscale#308

Closed

priancho reviewed Jan 12, 2021

View reviewed changes

dhkim0225 marked this pull request as draft January 13, 2021 02:04

github-actions bot added the has conflicts label Jan 13, 2021

dhkim0225 added 3 commits January 18, 2021 13:19

revert changelog

b83ea7f

Merge pull request #1 from PyTorchLightning/release/1.2-dev

8a18422

Release/1.2 dev

commit based on review

9913351

flowerpower67 reviewed Jan 18, 2021

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

dhkim0225 marked this pull request as ready for review January 18, 2021 06:19

Merge branch 'release/1.2-dev' into feature/clip_grad_by_value_1.2-dev

af9434e

Merge branch 'release/1.2-dev' into feature/clip_grad_by_value_1.2-dev

65e694f

tchaton reviewed Jan 19, 2021

View reviewed changes

dhkim0225 marked this pull request as draft January 20, 2021 01:49

mergify bot added the has conflicts label Jan 26, 2021

Merge branch 'release/1.2-dev' into feature/clip_grad_by_value_1.2-dev

b32048f

mergify bot removed the has conflicts label Jan 26, 2021

Merge branch 'release/1.2-dev' into feature/clip_grad_by_value_1.2-dev

6d3bf7c

mergify bot added the has conflicts label Jan 26, 2021

Merge branch 'release/1.2-dev' of https://github.com/PyTorchLightning…

cc6d794

…/pytorch-lightning into feature/clip_grad_by_value_1.2-dev

mergify bot removed the has conflicts label Jan 27, 2021

github-actions bot added the has conflicts label Jan 27, 2021

Merge branch 'release/1.2-dev' into feature/clip_grad_by_value_1.2-dev

7737ae5

mergify bot removed the has conflicts label Jan 28, 2021

Borda added the feature Is an improvement or enhancement label Jan 29, 2021

Borda added this to the 1.2 milestone Jan 29, 2021

Borda reviewed Jan 29, 2021

View reviewed changes

Borda assigned SeanNaren Jan 29, 2021

priancho suggested changes Jan 29, 2021

View reviewed changes

pytorch_lightning/plugins/legacy/native_amp.py Outdated Show resolved Hide resolved

remove bad line in native_amp.py

dcf9ff0

bugfix

mergify bot added the has conflicts label Feb 3, 2021

edenlightning modified the milestones: 1.2, 1.3 Feb 8, 2021

Base automatically changed from release/1.2-dev to master February 11, 2021 14:32

lexierule requested a review from carmocca as a code owner February 11, 2021 14:32

dhkim0225 closed this Feb 22, 2021

dhkim0225 deleted the feature/clip_grad_by_value_1.2-dev branch February 22, 2021 12:28

dhkim0225 mentioned this pull request Feb 22, 2021

Add Trainer(gradient_clip_algorithm='value'|'norm') #6123

Merged

11 tasks

add clip_grad_by_value feature #5477

add clip_grad_by_value feature #5477

Uh oh!

Conversation

dhkim0225 commented Jan 12, 2021 • edited by carmocca Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

PR review

Did you have fun?

Uh oh!

pep8speaks commented Jan 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-01-29 16:27:27 UTC

Uh oh!

codecov bot commented Jan 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

priancho left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhkim0225 commented Jan 13, 2021

Uh oh!

Uh oh!

dhkim0225 commented Jan 18, 2021

Uh oh!

dhkim0225 commented Jan 18, 2021

Uh oh!

tchaton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhkim0225 Jan 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Borda left a comment

Choose a reason for hiding this comment

Uh oh!

priancho left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SeanNaren commented Feb 2, 2021

Uh oh!

dhkim0225 commented Feb 15, 2021

Uh oh!

tchaton commented Feb 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhkim0225 commented Feb 17, 2021

Uh oh!

dhkim0225 commented Jan 12, 2021 •

edited by carmocca

Loading

pep8speaks commented Jan 12, 2021 •

edited

Loading

codecov bot commented Jan 12, 2021 •

edited

Loading

dhkim0225 Jan 20, 2021 •

edited

Loading

tchaton commented Feb 15, 2021 •

edited

Loading

dhkim0225 commented Feb 22, 2021 •

edited

Loading