Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

pmabbo13 added a commit that referenced this pull request Jul 13, 2022
…er model

ghstack-source-id: 946c573
Pull Request resolved: #1829
pmabbo13 added a commit that referenced this pull request Jul 13, 2022
…er model

ghstack-source-id: 946c573
Pull Request resolved: #1829
pmabbo13 added a commit that referenced this pull request Jul 13, 2022
…er model

ghstack-source-id: 946c573
Pull Request resolved: #1829
@pmabbo13 pmabbo13 requested a review from Nayef211 July 13, 2022 18:15
Copy link
Contributor

@Nayef211 Nayef211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parmeet I wonder if you think it makes sense to update our docs to include the T5Model class as well as the yet to be created T5Bundle. Similarly is there a reason why we don't include the RobertaModel class in the docstring if it's a public facing component?

pmabbo13 added a commit that referenced this pull request Jul 14, 2022
…er model

ghstack-source-id: 0219b63
Pull Request resolved: #1829
@pmabbo13
Copy link
Contributor Author

Description

The T5Model implementation is very similar to the nn.Transformer implementation, with some additional functionality

  1. Takes a tokenized encoder input sequence and decoder input sequence and transforms them into word embeddings.
  2. Computes the padding masks for the input sequences based on the padding_idx input argument when initializing the model.
  3. Generates a causal mask for decoder self-attention unless one has already been provided via the decoder_mask input argument to the forward method.
  4. Returns the output of the final layers to the encoder and decoder, the output at each layer of the encoder and decoder, the self-attentions scores of each layer of the encoder and decoder, the cross-attention scores of each layer of the decoder
  5. Based on input parameter encoder_only when initializing the model, will just run the encoder portion and output its corresponding results.

@pmabbo13 pmabbo13 requested a review from parmeet July 15, 2022 15:58
@pmabbo13 pmabbo13 requested a review from abhinavarora July 15, 2022 15:58
pmabbo13 added a commit that referenced this pull request Jul 15, 2022
…er model

ghstack-source-id: b5a8a5e
Pull Request resolved: #1829
Copy link
Contributor

@parmeet parmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM!

@parmeet
Copy link
Contributor

parmeet commented Jul 15, 2022

I wonder if you think it makes sense to update our docs to include the T5Model class as well as the yet to be created T5Bundle.

This is just a prototype feature. So it may not be necessary to include it in the docs. I don't see other domains doing this as well?

Similarly is there a reason why we don't include the RobertaModel class in the docstring if it's a public facing component?

hmm, that's a good catch. Not really. Looks like we missed including the docs for it, perhaps we were only focusing on the Roberta Bundler API that expose this model to users..

pmabbo13 added a commit that referenced this pull request Jul 15, 2022
…er model

ghstack-source-id: 0ae4e98
Pull Request resolved: #1829
pmabbo13 added a commit that referenced this pull request Jul 18, 2022
…er model

ghstack-source-id: a5da3a7
Pull Request resolved: #1829
@pmabbo13 pmabbo13 merged commit adbc511 into gh/pmabbo13/9/base Jul 18, 2022
pmabbo13 added a commit that referenced this pull request Jul 18, 2022
* compute relative position buckets for relative attention bias

[ghstack-poisoned]

* compute relative position bias for t5 attention

[ghstack-poisoned]

* compute attention scores for t5 model using relative attention bias

[ghstack-poisoned]

* perform multihead attention using relative attention bias for t5 model

[ghstack-poisoned]

* create T5MultiheadAttention module

[ghstack-poisoned]

* add layer norm module for t5 model

[ghstack-poisoned]

* add t5 layer module that can be used for both encoder or decoder stack

[ghstack-poisoned]

* add t5 stack that can function as either the encoder or decoder of a t5 model

[ghstack-poisoned]

* Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model"




[ghstack-poisoned]

* Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model"




[ghstack-poisoned]

* Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model"




[ghstack-poisoned]

* Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model"




[ghstack-poisoned]

* Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model"




[ghstack-poisoned]

* Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model"




[ghstack-poisoned]

* Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model"




[ghstack-poisoned]

* add t5 model that can function as both encodery-only or encoder-decoder model (#1829)
@facebook-github-bot facebook-github-bot deleted the gh/pmabbo13/9/head branch August 18, 2022 14:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants