adding forward method for multihead attention #1833

pmabbo13 · 2022-07-13T17:24:25Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

torchtext/prototype/t5/modules.py

Nayef211 · 2022-07-13T19:16:13Z

This is still a relatively large PR and it's a bit hard to tell which lines were added by you. It might be worthwhile leaving a comment on the PR highlighting the lines that are added here and are different from the original implementation (like you did in #1812).

pmabbo13 · 2022-07-13T19:34:01Z

torchtext/prototype/t5/modules.py

+            dropout_p = 0.0
+        else:
+            dropout_p = self.dropout
+


lines 256-274 encapsulate the changes made from _torch.nn.functional.multi_head_attention_forward to include relative attention bias. position_bias is then also added as a return value in lines 393, 400.

[ghstack-poisoned]

pmabbo13 · 2022-07-15T15:56:15Z

Description

The forward method for T5MultiheadAttention is a modified version of nn.Functional.multi_head_attention_forward meant to perform multihead attention with relative attention bias on the input query, key, and value tensors. The main modifications are as follows:

Parameters needed to compute relative attention bias are added as input arguments
Deprecated non-core functionalities such as add_zero_attn (adds a new batch of zeros to the key and value sequences at dim=1), and adding bias terms to the key and value projections
The nn.Functional.multi_head_attention_forward method reshapes the q, k, v to be 3D. Doing so appears to have led to discrepancies to the decoder outputs of the HF implementation when the input decoder sequence had a batch size larger than 1. The resolution was to shape these tensors to 4D (similarly to HF implementation), and that appeared to resolve the issue.

These changes are best visible via this commit.

pmabbo13 · 2022-07-15T16:06:20Z

torchtext/prototype/t5/modules.py

+        if key_padding_mask is not None and key_padding_mask.dtype == torch.uint8:
+            warnings.warn("Byte tensor for key_padding_mask is not supported. Using bool tensor instead.")
+            key_padding_mask = key_padding_mask.to(torch.bool)
+


We were having issues from the original implementation where the q, k, v tensors were reshaped to (batch_size * num_heads, seq_len, head_dim). This led to outputs that differed from the HF output when the input sequence to the decoder had batch size larger 1. The resolution to this was to shape these into 4D tensors of shape (batch_size, num_heads, seq_len, head_dim) which is the same reshaping done in the HF implementation.

parmeet

Overall LGTM!

torchtext/prototype/t5/modules.py

[ghstack-poisoned]

Nayef211

LGTM, thanks for adding a detailed description with the changes wrt to original implementation and resolving all PR comments!

[ghstack-poisoned]

adding forward method for multihead attention

1de4850

[ghstack-poisoned]

facebook-github-bot added the cla signed label Jul 13, 2022

pmabbo13 added 2 commits July 13, 2022 13:40

Update on "adding forward method for multihead attention"

7ee4f8d

[ghstack-poisoned]

Update on "adding forward method for multihead attention"

0daed76

[ghstack-poisoned]

pmabbo13 requested a review from Nayef211 July 13, 2022 18:15

Nayef211 reviewed Jul 13, 2022

View reviewed changes

torchtext/prototype/t5/modules.py Outdated Show resolved Hide resolved

torchtext/prototype/t5/modules.py Outdated Show resolved Hide resolved

pmabbo13 commented Jul 13, 2022

View reviewed changes

pmabbo13 mentioned this pull request Jul 14, 2022

Add T5 Model and Demo on Text Summarization using CNNDM Dataset #1800

Closed

25 tasks

Update on "adding forward method for multihead attention"

f415978

[ghstack-poisoned]

pmabbo13 requested review from abhinavarora and parmeet July 15, 2022 15:57

pmabbo13 commented Jul 15, 2022

View reviewed changes

parmeet approved these changes Jul 15, 2022

View reviewed changes

Update on "adding forward method for multihead attention"

0923c6d

[ghstack-poisoned]

Nayef211 approved these changes Jul 18, 2022

View reviewed changes

Update on "adding forward method for multihead attention"

084b0fc

[ghstack-poisoned]

pmabbo13 merged commit 283590d into gh/pmabbo13/13/base Jul 18, 2022

facebook-github-bot deleted the gh/pmabbo13/13/head branch August 18, 2022 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding forward method for multihead attention #1833

adding forward method for multihead attention #1833

Uh oh!

pmabbo13 commented Jul 13, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Nayef211 commented Jul 13, 2022

Uh oh!

pmabbo13 Jul 13, 2022 •

edited

Loading

Uh oh!

pmabbo13 commented Jul 15, 2022

Uh oh!

pmabbo13 Jul 15, 2022

Uh oh!

parmeet left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nayef211 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

adding forward method for multihead attention #1833

adding forward method for multihead attention #1833

Uh oh!

Conversation

pmabbo13 commented Jul 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nayef211 commented Jul 13, 2022

Uh oh!

pmabbo13 Jul 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pmabbo13 commented Jul 15, 2022

Description

Uh oh!

pmabbo13 Jul 15, 2022

Choose a reason for hiding this comment

Uh oh!

parmeet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Nayef211 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pmabbo13 commented Jul 13, 2022 •

edited

Loading

pmabbo13 Jul 13, 2022 •

edited

Loading