Skip to content

Conversation

@shbiswas834
Copy link

No description provided.

@meta-cla meta-cla bot added the cla signed label Nov 13, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 14, 2025

@spcyppt has imported this pull request. If you are a Meta employee, you can view this in D87103224.

num_cols = num_cols_group[0];
warps_per_row = (num_cols + COLS_PER_WARP - 1) >> LOG_COLS_PER_WARP;
}
// USE_INDEX_SELECT is a template argument; the compiler prunes the unused branch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please guard all the changes to be ROCm only. We have benchmarked and this regressed NVIDIA's.

Copy link
Author

@shbiswas834 shbiswas834 Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the original code and guarded the current changes in 76b963d

@shbiswas834 shbiswas834 force-pushed the shbiswas/group_index_bwd branch from 4f41fdf to 1f28387 Compare November 19, 2025 05:11
@shbiswas834 shbiswas834 force-pushed the shbiswas/group_index_bwd branch from 1f28387 to 1063c0c Compare November 19, 2025 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants