Skip to content

Conversation

@cthi
Copy link
Contributor

@cthi cthi commented Sep 4, 2025

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1848

Re-tune MXFP8 grouped gemm with tuning tooling to autogen a heuristic on B200 @ peak 750W

  • We see peak tflops of 1963, so we can achieve near roofline for some MNK which is nice. Some shapes still cannot achieve it, so likely room for further improvements. But let's get something out the door first.
  • Remove some unnecessary template code.

Differential Revision: D81683544

@netlify
Copy link

netlify bot commented Sep 4, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 89d43b1
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68b9fef4d828c000072751b0
😎 Deploy Preview https://deploy-preview-4821--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Sep 4, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81683544

Summary:

X-link: facebookresearch/FBGEMM#1848

[Re-tune MXFP8 grouped gemm](https://docs.google.com/spreadsheets/d/1xk8h1OZFnvKyH7kpP-FFZmXPu5Tmv1AoJ7--KEX4pXg/edit?gid=0#gid=0) with tuning tooling to autogen a heuristic on B200 @ peak 750W
- We see peak tflops of ~2K. Some shapes still cannot achieve it, so likely room for further improvements.
- Compared to BF16 grouped gemm baseline, roughly ~1.5-2x improvement.
- Compared to old heuristic, ~1.1-1.3x improvement.
- Note: Blackwell is rather finicky with benchmarking, and I noticed decent amount of variation between runs. But this looks better so we can just ship it for now.

Also remove some unnecessary template code.

Differential Revision: D81683544
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81683544

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 59b2bfd.

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Sep 6, 2025
…ump (#162209)

## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: pytorch/FBGEMM#4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
    - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
    - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
    - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
    - Bump FBGEMM third party submodule to include:
          - pytorch/FBGEMM#4816
          - pytorch/FBGEMM#4820
          - pytorch/FBGEMM#4821
          - pytorch/FBGEMM#4823

#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`

## Test plan

#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581)

#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...

test/test_matmul_cuda.py .........                                                                                                                        [100%]

============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```

Pull Request resolved: #162209
Approved by: https://github.com/ngimel
daisyden pushed a commit to daisyden/pytorch that referenced this pull request Sep 8, 2025
…ump (pytorch#162209)

## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: pytorch/FBGEMM#4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
    - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
    - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
    - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
    - Bump FBGEMM third party submodule to include:
          - pytorch/FBGEMM#4816
          - pytorch/FBGEMM#4820
          - pytorch/FBGEMM#4821
          - pytorch/FBGEMM#4823

#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`

## Test plan

#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581)

#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...

test/test_matmul_cuda.py .........                                                                                                                        [100%]

============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```

Pull Request resolved: pytorch#162209
Approved by: https://github.com/ngimel
markc-614 pushed a commit to markc-614/pytorch that referenced this pull request Sep 17, 2025
…ump (pytorch#162209)

## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: pytorch/FBGEMM#4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
    - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
    - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
    - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
    - Bump FBGEMM third party submodule to include:
          - pytorch/FBGEMM#4816
          - pytorch/FBGEMM#4820
          - pytorch/FBGEMM#4821
          - pytorch/FBGEMM#4823

#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`

## Test plan

#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581)

#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...

test/test_matmul_cuda.py .........                                                                                                                        [100%]

============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```

Pull Request resolved: pytorch#162209
Approved by: https://github.com/ngimel
mansiag05 pushed a commit to mansiag05/pytorch that referenced this pull request Sep 22, 2025
…ump (pytorch#162209)

## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: pytorch/FBGEMM#4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
    - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
    - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
    - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
    - Bump FBGEMM third party submodule to include:
          - pytorch/FBGEMM#4816
          - pytorch/FBGEMM#4820
          - pytorch/FBGEMM#4821
          - pytorch/FBGEMM#4823

#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`

## Test plan

#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581)

#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...

test/test_matmul_cuda.py .........                                                                                                                        [100%]

============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```

Pull Request resolved: pytorch#162209
Approved by: https://github.com/ngimel
cleonard530 pushed a commit to cleonard530/pytorch that referenced this pull request Sep 22, 2025
…ump (pytorch#162209)

## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: pytorch/FBGEMM#4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
    - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
    - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
    - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
    - Bump FBGEMM third party submodule to include:
          - pytorch/FBGEMM#4816
          - pytorch/FBGEMM#4820
          - pytorch/FBGEMM#4821
          - pytorch/FBGEMM#4823

#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`

## Test plan

#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581)

#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...

test/test_matmul_cuda.py .........                                                                                                                        [100%]

============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```

Pull Request resolved: pytorch#162209
Approved by: https://github.com/ngimel
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
…ump (pytorch#162209)

## Summary
- We just landed 2d-2d support for mxfp8 grouped gemm in FBGEMM: pytorch/FBGEMM#4816
- This is needed for backward pass of mxfp8 MoE training with grouped gemms
- Changes:
    - Add dispatching + input validation for mxfp8 grouped gemm in `torch._scaled_grouped_mm`
    - Add meta registration input validation for mxfp8 grouped gemm, for composability with compile
    - Add unit tests exercising torch._scaled_grouped_mm with mxfp8 inputs
    - Bump FBGEMM third party submodule to include:
          - pytorch/FBGEMM#4816
          - pytorch/FBGEMM#4820
          - pytorch/FBGEMM#4821
          - pytorch/FBGEMM#4823

#### How fbgemm dependency was bumped
Documenting this since I haven't found it documented elsewhere:
- `cd ~/pytorch/third_party/fbgemm`
- `git fetch`
- `git checkout <hash>`
- `cd ~/pytorch`
- `git add third_party/fbgemm`

## Test plan

#### Test build
```
USE_FBGEMM_GENAI=1 python -m pip install --no-build-isolation -v -e .
...
Successfully installed torch-2.9.0a0+gitf5070f3
```
[full build log](https://www.internalfb.com/phabricator/paste/view/P1933787581)

#### Unit tests
```
pytest test/test_matmul_cuda.py -k test_mxfp8_scaled_grouped_mm_
...

test/test_matmul_cuda.py .........                                                                                                                        [100%]

============================================================== 9 passed, 1668 deselected in 5.34s ===============================================================
```

Pull Request resolved: pytorch#162209
Approved by: https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants