CUDA: topk-moe: add optional parameter for gpt-oss #16649

am17an · 2025-10-18T11:24:16Z

While looking at this kernel I realized that it is relatively easy to add it for gpt-oss, which does the softmax after the top-k.

Performance on a 4090:

Model	Test	t/s master	t/s cuda_gpt_oss_opt	Speedup
gpt-oss 20B MXFP4 MoE	tg32	170.99	177.68	1.04
gpt-oss 20B MXFP4 MoE	tg64	168.75	175.36	1.04
gpt-oss 20B MXFP4 MoE	tg128	167.01	173.33	1.04

Based on ggml-org#16649.

avidwriter · 2025-10-22T12:10:00Z

how to use this?

am17an · 2025-10-22T12:15:28Z

@avidwriter if you are using the CUDA backend, with the latest master it should already be included

…16649)"

Based on ggml-org#16649.

am17an requested a review from slaren as a code owner October 18, 2025 11:24

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 18, 2025

am17an requested a review from JohannesGaessler October 18, 2025 11:28

jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Oct 18, 2025

vulkan: Update topk_moe fusion to handle gpt's late softmax

0111a34

Based on ggml-org#16649.

jeffbolznv mentioned this pull request Oct 18, 2025

vulkan: Update topk_moe fusion to handle gpt's late softmax #16656

Open

jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Oct 21, 2025

vulkan: Update topk_moe fusion to handle gpt's late softmax

34d4122

Based on ggml-org#16649.

am17an added 3 commits October 21, 2025 19:49

CUDA: topk-moe: add optional parameter for gpt-oss

5632159

add parameter to avoid runtime branch

2de54df

use ggml_can_fuse_subgraph

17c3927

am17an force-pushed the cuda_topk_moe_gpt_oss branch from 49a541e to 17c3927 Compare October 21, 2025 11:53

jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Oct 21, 2025

vulkan: Update topk_moe fusion to handle gpt's late softmax

b2d689a

Based on ggml-org#16649.

JohannesGaessler approved these changes Oct 21, 2025

View reviewed changes

am17an merged commit 03792ad into ggml-org:master Oct 21, 2025
70 checks passed

am17an deleted the cuda_topk_moe_gpt_oss branch October 21, 2025 15:21

ye-NX pushed a commit to ye-NX/llama.cpp that referenced this pull request Oct 21, 2025

CUDA: topk-moe: add optional parameter for gpt-oss (ggml-org#16649)

7869ac7

am17an mentioned this pull request Oct 22, 2025

CUDA: General GEMV fusion #16715

Merged

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 22, 2025

Revert "CUDA: topk-moe: add optional parameter for gpt-oss (ggml-org#…

46a1ef1

…16649)"

FMayran pushed a commit to FMayran/llama.cpp that referenced this pull request Oct 23, 2025

CUDA: topk-moe: add optional parameter for gpt-oss (ggml-org#16649)

54be1a7

pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025

CUDA: topk-moe: add optional parameter for gpt-oss (ggml-org#16649)

8c01a63

jeffbolznv added a commit to jeffbolznv/llama.cpp that referenced this pull request Oct 26, 2025

vulkan: Update topk_moe fusion to handle gpt's late softmax

6cccaef

Based on ggml-org#16649.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: topk-moe: add optional parameter for gpt-oss #16649

CUDA: topk-moe: add optional parameter for gpt-oss #16649

am17an commented Oct 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

avidwriter commented Oct 22, 2025

Uh oh!

am17an commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CUDA: topk-moe: add optional parameter for gpt-oss #16649

CUDA: topk-moe: add optional parameter for gpt-oss #16649

Conversation

am17an commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

avidwriter commented Oct 22, 2025

Uh oh!

am17an commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

am17an commented Oct 18, 2025 •

edited

Loading