You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 10, 2025. It is now read-only.
Update base for Update on "create T5MultiheadAttention module"
# Description
Add T5 architecture to torchtext
# Process
The T5 architecture is very similar to the architecture of a traditional transformer. The main differences are that rather than using positional embeddings, it computes a relative attention bias that encodes the relative position of a token within a sequence. This position bias is then passed into each layer and used to compute the attention scores. T5 also uses a simplified layer normalization (root mean square normalization) which occurs at the start of every attention and feed-forward block.
Incorporating relative attention bias requires under the hood changes to the MultiHeadAttention module. We can use HF's implementation for computing relative attention bias and modify the source code for torch.nn.MultiHeadAttention to incorporate relative attention bias. We can also create our own layer normalization, similarly to HF.
Given the above components, we can then define our own T5Layer, T5Stack, and T5Model.
* The T5Layer can be used either as an encoder layer or decoder layer based on an input boolean parameter. The only difference between the decoder layer versus the encoder layer is that the decoder layer also performs cross-attention with the encoder output.
* T5Stack can also be used as either an encoder or decoder based on an input boolean parameter. This dictates which type of layer composes the stack.
* T5Model can be used either as an encoder-only or encoder-decoder model based on an input boolean parameter. If it is an encoder-decoder model, a causal mask is generated for the decoder input tokens.
# Testing
not yet implemented
# Stack
WIP PR where implementation details were discussed: #1812
[ghstack-poisoned]
0 commit comments