-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
🚀 Feature
Allow specification of the gradient clipping norm_type, which by default is euclidean and fixed.
Motivation
We are using pytorch lightning to increase training performance in the standalone Federated Learning context (experimental setting). In this context the trained models diverge from their underlying data and get aggregated on the server side which leads to larger gradients in general. To preserve the direction of the gradient, but limit the magnitude per single dimension, we need to apply the inf norm.
Pitch
Add a parameter gradient_clipping_norm_type: float=2.0 to trainer. Pass the parameter to the _clip_gradients method. Changing the call from
_clip_gradients(optimizer, grad_clip_val)
to somewhat like
_clip_gradients(optimizer, grad_clip_val, grad_clip_norm_type)
Additional context
The impact is minimal and only effects the following function
Which is calling