Describe the bug
Unlike the rest of the optimizers API, in LAMB the name of the argument for weight_decay
is weight_decay_rate
In my experiments where I try various different optimizers, this makes me add a special condition for LAMB because of this different name of the argument.
Code to reproduce the issue
|
weight_decay_rate: FloatTensorLike = 0.0, |