File tree Expand file tree Collapse file tree 1 file changed +10
-10
lines changed
tensorflow_addons/optimizers Expand file tree Collapse file tree 1 file changed +10
-10
lines changed Original file line number Diff line number Diff line change 2323
2424@tf .keras .utils .register_keras_serializable (package = "Addons" )
2525class NovoGrad (tf .keras .optimizers .Optimizer ):
26- """The NovoGrad Optimizer was first proposed in [Stochastic Gradient
27- Methods with Layerwise Adaptvie Moments for training of Deep
28- Networks](https://arxiv.org/pdf/1905.11286.pdf)
29-
30- NovoGrad is a first-order SGD-based algorithm, which computes second
31- moments per layer instead of per weight as in Adam. Compared to Adam,
32- NovoGrad takes less memory, and has been found to be more numerically
33- stable. More specifically we compute (for more information on the
34- computation please refer to this
35- [link](https://nvidia.github.io/OpenSeq2Seq/html/optimizers.html):
26+ """Optimizer that implements NovoGrad.
27+
28+ The NovoGrad Optimizer was first proposed in [Stochastic Gradient
29+ Methods with Layerwise Adaptive Moments for training of Deep
30+ Networks](https://arxiv.org/pdf/1905.11286.pdf) NovoGrad is a
31+ first-order SGD-based algorithm, which computes second moments per
32+ layer instead of per weight as in Adam. Compared to Adam, NovoGrad
33+ takes less memory, and has been found to be more numerically stable.
34+ (For more information on the computation please refer to this
35+ [link](https://nvidia.github.io/OpenSeq2Seq/html/optimizers.html))
3636
3737 Second order moment = exponential moving average of Layer-wise square
3838 of grads:
You can’t perform that action at this time.
0 commit comments