-
Notifications
You must be signed in to change notification settings - Fork 617
LayernormSimpleRNN moved to addons #841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the code with the assumption of layernorm=True, and I will take a look again.
…normal self.bias term after scaling with layernorm for centering. docstring with explanatory formulas added to cell's call method
The unit tests fail with this message
and
Is this error about this in
|
@seanpmorgan can you run kokoro:force-run again, please? |
Python2 kokoro tests (GPU, CPU) are still failing with
In |
bias_regularizer: Regularizer function applied to the bias vector | ||
(`use_bias=True`) or for the beta vector of the layer normalization | ||
layer (`use_layernorm=True`). Default: `None`. | ||
gamma_regularizer: Regularizer function applied to the gamma vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure where the gamma regularizer and constrant are useful or not. I chose to not expose those two in the layernorm lstm, so that we don't have a big number of params that confuse users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for training of the scaling parameter in LayerNormalization
.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/layers/normalization.py
line 1033 - 1040
Why to set self.gamma_constraint
?
- When your scaling parameters shrinks towards zero (e.g., other weights increase, too unscaled inputs), or becomes too large
- You know from experience with the problem, that the scaling parameter always was in a certain range
Why to set self.gamma_regularizer
?
- When your scaling parameter explodes (e.g., input values and hidden state values are too small to generate new hidden values that are big enough)
The gamma constraint and regularizer are hyperparameters a user could try when model training terminates with weird scaling parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good. Thanks for the change.
epsilon=layernorm_epsilon, | ||
center=False, | ||
scale=True, | ||
beta_initializer=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None seems to be a weird default. Should it be zeros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if center=False
then beta
is is not used at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove all three inputs that are like beta_initializer=None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think let layernorm to use its default value should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, and thanks for the contribution.
Code moved from this PR tensorflow/tensorflow#35469