Add subgroup matrix multiplication #80

junjihashimoto · 2025-09-05T04:21:55Z

Add subgroup matrix multiplication.
~~The kernel can be executed with the subgroupMatrixMultiplication function, but the results are incorrect. I am currently debugging.~~

> sysctl -n machdep.cpu.brand_string
Apple M4 Max
> MATMUL_VERSION=12 ./build/matmul  | grep -A 2 'Dispatching\|Exec'
matmul(50174,0x2016ee140) malloc: nano zone abandoned due to inability to reserve vm space.
[info] Dispatching Kernel version 12: f16: Subgroup matrix multiply with transpose, 30 iterations ...
[info] Copying result to CPU
[info]
--
Execution Time: (M = 4096, K = 4096, N = 8192) x 30 iterations :
40.0 milliseconds / dispatch ~ 6871.66 GFLOPS
================================================================================
> MATMUL_VERSION=11 ./build/matmul  | grep -A 2 'Dispatching\|Exec'
matmul(13932,0x2016ee140) malloc: nano zone abandoned due to inability to reserve vm space.
[info] Dispatching Kernel version 11: f16: 2D blocktiling with loop unrolling, vectorization and transpose, 30 iterations ...
[info] Copying result to CPU
[info]
--
Execution Time: (M = 4096, K = 4096, N = 8192) x 30 iterations :
26.6 milliseconds / dispatch ~ 10316.68 GFLOPS ## This is the result not using the subgroupMatrixMultiplication function.
================================================================================

junjihashimoto · 2025-09-05T18:55:17Z

The main branch does not seem to output any shader compilation errors.

Add dev branch to CI

8f10387

junjihashimoto added 8 commits September 6, 2025 04:05

Add third_party/headers/webgpu to the INCLUDE path

c5f7a00

Fix dispatchKernel arguments in the examples

2b1767d

Add cmake-ci of github-actions

b8b4c58

Add libxinerama-dev, libxcursor-dev, libxi-dev, libgl-dev and libxcb-dev

bcb81e1

Fix a segmentation fault of wgpuBufferRelease

7abfedd

Fix the size of packed tensor

c2cdcd6

Add the subgroup matrix multiplication

74e1bd5

Bump dawn

4ef6361

junjihashimoto force-pushed the feature/matmul branch from 4d5d20b to 21b2f6f Compare September 22, 2025 09:32

junjihashimoto changed the base branch from main to dev September 22, 2025 09:33

Fix the wgsl code of subgroup-matix-multiplication

3165df5

junjihashimoto force-pushed the feature/matmul branch from 21b2f6f to 3165df5 Compare September 22, 2025 09:34

junjihashimoto added 2 commits September 23, 2025 11:01

Apply loop-unrolling

e87d791

Disable chromium.subgroup_matrix_uniformity

ffbb983

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add subgroup matrix multiplication #80

Add subgroup matrix multiplication #80

Uh oh!

junjihashimoto commented Sep 5, 2025 •

edited

Loading

Uh oh!

junjihashimoto commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add subgroup matrix multiplication #80

Are you sure you want to change the base?

Add subgroup matrix multiplication #80

Uh oh!

Conversation

junjihashimoto commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junjihashimoto commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

junjihashimoto commented Sep 5, 2025 •

edited

Loading