-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Closed
Labels
TensorsIssues relating to tensors (generic issues/questions or specific tensor tutorials)Issues relating to tensors (generic issues/questions or specific tensor tutorials)docathon-h1-2023A label for the docathon in H1 2023A label for the docathon in H1 2023medium
Description
In https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html I think there is a bug:
dot_prod = q.div_(scale).matmul(k.align_to(..., 'D_head', 'T_key'))
[...]
attn_weights = self.attn_dropout(F.softmax(dot_prod / scale,
dim='T_key'))
the scaling is done twice, and I think it should be done only once.
Thanks.
Metadata
Metadata
Assignees
Labels
TensorsIssues relating to tensors (generic issues/questions or specific tensor tutorials)Issues relating to tensors (generic issues/questions or specific tensor tutorials)docathon-h1-2023A label for the docathon in H1 2023A label for the docathon in H1 2023medium