-
Notifications
You must be signed in to change notification settings - Fork 814
add t5 model that can function as both encodery-only or encoder-decoder model #1829
Conversation
…er model [ghstack-poisoned]
…coder-decoder model" [ghstack-poisoned]
…coder-decoder model" [ghstack-poisoned]
…coder-decoder model" [ghstack-poisoned]
…coder-decoder model" [ghstack-poisoned]
Nayef211
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parmeet I wonder if you think it makes sense to update our docs to include the T5Model class as well as the yet to be created T5Bundle. Similarly is there a reason why we don't include the RobertaModel class in the docstring if it's a public facing component?
…coder-decoder model" [ghstack-poisoned]
DescriptionThe T5Model implementation is very similar to the nn.Transformer implementation, with some additional functionality
|
…coder-decoder model" [ghstack-poisoned]
parmeet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM!
This is just a prototype feature. So it may not be necessary to include it in the docs. I don't see other domains doing this as well?
hmm, that's a good catch. Not really. Looks like we missed including the docs for it, perhaps we were only focusing on the Roberta Bundler API that expose this model to users.. |
…coder-decoder model" [ghstack-poisoned]
…coder-decoder model" [ghstack-poisoned]
* compute relative position buckets for relative attention bias [ghstack-poisoned] * compute relative position bias for t5 attention [ghstack-poisoned] * compute attention scores for t5 model using relative attention bias [ghstack-poisoned] * perform multihead attention using relative attention bias for t5 model [ghstack-poisoned] * create T5MultiheadAttention module [ghstack-poisoned] * add layer norm module for t5 model [ghstack-poisoned] * add t5 layer module that can be used for both encoder or decoder stack [ghstack-poisoned] * add t5 stack that can function as either the encoder or decoder of a t5 model [ghstack-poisoned] * Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model" [ghstack-poisoned] * Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model" [ghstack-poisoned] * Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model" [ghstack-poisoned] * Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model" [ghstack-poisoned] * Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model" [ghstack-poisoned] * Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model" [ghstack-poisoned] * Update base for Update on "add t5 model that can function as both encodery-only or encoder-decoder model" [ghstack-poisoned] * add t5 model that can function as both encodery-only or encoder-decoder model (#1829)
Stack from ghstack (oldest at bottom):