Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@pmabbo13 pmabbo13 requested a review from Nayef211 July 13, 2022 18:15
Copy link
Contributor

@Nayef211 Nayef211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave comment with differences if any from original implementation

@pmabbo13
Copy link
Contributor Author

Leave comment with differences if any from original implementation

updated!

@pmabbo13
Copy link
Contributor Author

Description

The T5Stack implementation is very similar to the nn.TransformerEncoder and nn.TransformerDecoder implementations. The main differences are that:

  1. T5Stack generalizes to be used as either an encoder or decoder layer. This is dependent on the value passed in for is_decoder and will dictate whether the stack of layers are configured to be encoder layers or decoder layers.
  2. T5Stack returns the output of the final layer, the output of every layer in the stack, the self-attention scores computed at every layer in the stack, and the cross-attention scores at every layer of stack (will be None if stack is a decoder).

@pmabbo13 pmabbo13 requested review from abhinavarora and parmeet July 15, 2022 15:58
Copy link
Contributor

@parmeet parmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pmabbo13 pmabbo13 merged commit 80753de into gh/pmabbo13/8/base Jul 18, 2022
@facebook-github-bot facebook-github-bot deleted the gh/pmabbo13/8/head branch August 18, 2022 14:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants