-
Notifications
You must be signed in to change notification settings - Fork 601
Renamed bsz to bs for consistency; removed dead code
#299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
| Attributes: | ||
| n_kv_heads (int): Number of key and value heads. | ||
| n_heads (int): Number of query heads. | ||
| n_local_kv_heads (int): Number of local key and value heads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not an attribute (only one occurrence of n_local_kv_heads if you search in this file)
| torch.Tensor: Output tensor after attention. | ||
| """ | ||
| bsz, seqlen, _ = x.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all inline comments in this method use bs for batch size so can make this bs for consistency
| torch.Tensor: Output logits after applying the Transformer model. | ||
| """ | ||
| _bsz, seqlen = tokens.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, _bsz is unused, so just remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it helps readability to know the tokens.shape is (batch size, sequence length), I can keep it and maybe rename it to _bs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although not used, it improves code readability -- it tells how many dimensions tokens has, and what they are. So IMO I'd wish they are kept. Also, the "unusedness" has been indicated using the _ prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it helps readability to know the
tokens.shapeis (batch size, sequence length), I can keep it and maybe rename it to_bs?
just saw this message, yeah I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to _bs
tianyu-l
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment inline.
| torch.Tensor: Output logits after applying the Transformer model. | ||
| """ | ||
| _bsz, seqlen = tokens.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although not used, it improves code readability -- it tells how many dimensions tokens has, and what they are. So IMO I'd wish they are kept. Also, the "unusedness" has been indicated using the _ prefix.
some minor cleanups [ghstack-poisoned]
ghstack-source-id: bbedad3 Pull Request resolved: pytorch#299
ghstack-source-id: bbedad3 Pull Request resolved: pytorch#299
Stack from ghstack (oldest at bottom):
bsztobsfor consistency; removed dead code #299some minor cleanups