You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 10, 2025. It is now read-only.
# NOTE: Taken from https://github.com/huggingface/transformers/blob/8581a798c0a48fca07b29ce2ca2ef55adcae8c7e/src/transformers/models/t5/modeling_t5.py#L239
424
+
classT5LayerNorm(nn.Module):
425
+
def__init__(self, d_model, eps=1e-6) ->None:
426
+
"""
427
+
Construct a layernorm module in the T5 style. No bias and no subtraction of mean.
428
+
"""
429
+
super().__init__()
430
+
self.weight=nn.Parameter(torch.ones(d_model))
431
+
self.variance_epsilon=eps
432
+
433
+
defforward(self, hidden_states: Tensor) ->Tensor:
434
+
r"""
435
+
T5 uses a layer_norm which only scales and doesn't shift, which is also known as Root Mean
436
+
Square Layer Normalization https://arxiv.org/abs/1910.07467 thus varience is calculated
437
+
w/o mean and there is no bias. Additionally we want to make sure that the accumulation for
438
+
half-precision inputs is done in fp32.
439
+
Args:
440
+
hidden_states: Tensor to be normalized. Final dimension must be model dimension (i.e. number of expected features in the input)
441
+
Returns:
442
+
a Tensor with the same shape as hidden_states after having been normalized
0 commit comments