-
Notifications
You must be signed in to change notification settings - Fork 68
NHD layout #603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
NHD layout #603
Conversation
a236952 to
bd7b908
Compare
bd7b908 to
606442a
Compare
|
@petercad pls review |
|
@jiyang1011 @taozha2 @tdeng5 pls review |
b0c8015 to
55cec40
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for NHD (seq_len, num_heads, head_dim) layout in addition to the existing HND (num_heads, seq_len, head_dim) layout for the BMG flash attention example. The NHD layout is commonly used in VLLM/sglang frameworks and is set as the new default.
Key changes:
- Added
--layoutcommand-line option with validation for "NHD" and "HND" values - Updated stride calculations to support both layout formats
- Modified the verification function to handle layout-specific tensor indexing and data reordering
Co-authored-by: Copilot <[email protected]>
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@rolandschulz pls review and merge |
|
@sunjiweiswift did you observe some perf gains using HND than NHD when low precision e.g., FP8 is enabled? FlashInfer says it's more friendly for GPU implementation. |
@hshen14
There is not much difference in the case of BF16. FP8 and other low-precision types are not currently supported.
|

NHD: the last 3 dimensions are organized as (seq_len, num_heads, head_dim).
HND: the last 3 dimensions are organized as (num_heads, seq_len, head_dim).
In VLLM/sglang, NHD is a more commonly used format. Support for NHD has been added in the release pr.