MXFP8 Grouped GEMM tuning #4821

cthi · 2025-09-04T19:16:37Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1848

Re-tune MXFP8 grouped gemm with tuning tooling to autogen a heuristic on B200 @ peak 750W

We see peak tflops of 1963, so we can achieve near roofline for some MNK which is nice. Some shapes still cannot achieve it, so likely room for further improvements. But let's get something out the door first.
Remove some unnecessary template code.

Differential Revision: D81683544

netlify · 2025-09-04T19:16:42Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`89d43b1`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68b9fef4d828c000072751b0
😎 Deploy Preview	https://deploy-preview-4821--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot · 2025-09-04T19:16:47Z

This pull request was exported from Phabricator. Differential Revision: D81683544

Summary: X-link: facebookresearch/FBGEMM#1848 [Re-tune MXFP8 grouped gemm](https://docs.google.com/spreadsheets/d/1xk8h1OZFnvKyH7kpP-FFZmXPu5Tmv1AoJ7--KEX4pXg/edit?gid=0#gid=0) with tuning tooling to autogen a heuristic on B200 @ peak 750W - We see peak tflops of ~2K. Some shapes still cannot achieve it, so likely room for further improvements. - Compared to BF16 grouped gemm baseline, roughly ~1.5-2x improvement. - Compared to old heuristic, ~1.1-1.3x improvement. - Note: Blackwell is rather finicky with benchmarking, and I noticed decent amount of variation between runs. But this looks better so we can just ship it for now. Also remove some unnecessary template code. Differential Revision: D81683544

facebook-github-bot · 2025-09-04T21:04:57Z

This pull request was exported from Phabricator. Differential Revision: D81683544

facebook-github-bot · 2025-09-05T00:50:21Z

This pull request has been merged in 59b2bfd.

…ump (#162209) ## Summary - We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: pytorch/FBGEMM#4816 - This is needed for backward pass of mxfp8 MoE training with grouped gemms - Changes: - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm` - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs - Bump FBGEMM third party submodule to include: - pytorch/FBGEMM#4816 - pytorch/FBGEMM#4820 - pytorch/FBGEMM#4821 - pytorch/FBGEMM#4823 #### How fbgemm dependency was bumped Documenting this since I haven't found it documented elsewhere: - `cd ~/pytorch/third_party/fbgemm` - `git fetch` - `git checkout <hash>` - `cd ~/pytorch` - `git add third_party/fbgemm` ## Test plan #### Test build ``` USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e . ... Successfully installed torch-2.9.0a0+gitf5070f3 ``` [full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581) #### Unit tests ``` pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_ ... test/test_matmul_cuda.py ......... [100%] ============================================================== 9 passed, 1668 deselected in 5.34s =============================================================== ``` Pull Request resolved: #162209 Approved by: https://github.com/ngimel

…ump (pytorch#162209) ## Summary - We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: pytorch/FBGEMM#4816 - This is needed for backward pass of mxfp8 MoE training with grouped gemms - Changes: - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm` - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs - Bump FBGEMM third party submodule to include: - pytorch/FBGEMM#4816 - pytorch/FBGEMM#4820 - pytorch/FBGEMM#4821 - pytorch/FBGEMM#4823 #### How fbgemm dependency was bumped Documenting this since I haven't found it documented elsewhere: - `cd ~/pytorch/third_party/fbgemm` - `git fetch` - `git checkout <hash>` - `cd ~/pytorch` - `git add third_party/fbgemm` ## Test plan #### Test build ``` USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e . ... Successfully installed torch-2.9.0a0+gitf5070f3 ``` [full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581) #### Unit tests ``` pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_ ... test/test_matmul_cuda.py ......... [100%] ============================================================== 9 passed, 1668 deselected in 5.34s =============================================================== ``` Pull Request resolved: pytorch#162209 Approved by: https://github.com/ngimel

meta-cla bot added the cla signed label Sep 4, 2025

facebook-github-bot added the fb-exported label Sep 4, 2025

danielvegamyhre mentioned this pull request Sep 4, 2025

MXFP8 grouped GEMM support for torch._scaled_grouped_mm + submodule bump pytorch/pytorch#162209

Closed

cthi force-pushed the export-D81683544 branch from d5b2818 to 89d43b1 Compare September 4, 2025 21:04

facebook-github-bot closed this in 59b2bfd Sep 5, 2025

facebook-github-bot added the Merged label Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MXFP8 Grouped GEMM tuning #4821

MXFP8 Grouped GEMM tuning #4821

Uh oh!

cthi commented Sep 4, 2025

Uh oh!

netlify bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

facebook-github-bot commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MXFP8 Grouped GEMM tuning #4821

MXFP8 Grouped GEMM tuning #4821

Uh oh!

Conversation

cthi commented Sep 4, 2025

Uh oh!

netlify bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

facebook-github-bot commented Sep 4, 2025

Uh oh!

facebook-github-bot commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

netlify bot commented Sep 4, 2025 •

edited

Loading