-
Notifications
You must be signed in to change notification settings - Fork 814
computing attention scores using relative attention bias #1832
Conversation
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
WIP PR to workshop implementation: #1812 [ghstack-poisoned]
DescriptionHaving computed the relative attention bias term, this method computes the attention scores. The implementation is very similar to the nn.Functional._scaled_dot_product_attention, expect that we pass in Since the input tensors to this function are 4-dimensional, we replace the |
parmeet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Stack from ghstack (oldest at bottom):
WIP PR to workshop implementation: #1812