-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Feature Description
In the "Attention is all you need" paper, the queries and keys share the same dimension of
It would be great to support different key and value lengths.
Motivation
Some upcoming models employ different key lengths than
Possible Implementation
Other than plumbing to get these new values for n_embd, n_embd_gqa, n_embd_head, n_rot, and n_head_kv are used to make sure the assumptions are still sane.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request