ggml webgpu: profiling, CI updates, reworking of command submission #16452

reeselevine · 2025-10-07T03:47:48Z

This PR adds:

Optional profiling for the host (CPU) and device (GPU) sides, which will be helpful when working on optimization.
Bumps WebGPU Ubuntu CI testing to use 24.04 to match Vulkan
Reworks command submission to avoid global locks on pushing to staging vectors, and instead have each thread submit work in graph_compute.
As part of that work, I was running into issues where deadlock was occurring, either because the parameter buffer pool was exhausted, or in some cases due to what I believe are underlying issues in Dawn's thread-safety approach. Serializing each command submission seems to avoid issues, so added options for that, and left a TODO in the code to explore other solutions.
For now, I'm comfortable leaving it as is, because one of the next major goals is getting the WebGPU backend running fully in the browser, which will likely be in a single-threaded context to start, as mentioned in webgpu : fix build on emscripten #15826.

CISC · 2025-10-07T10:41:32Z

BTW, consider adding yourself to CODEOWNERS.

CISC · 2025-10-07T18:46:06Z

@reeselevine I think you have the ability, merge at will.

reeselevine · 2025-10-07T19:42:23Z

@CISC yeah I will, I was just at the gym this morning and thought of a potential solution for my deadlock problems, so trying a couple commits to fix that, then I will merge!

CODEOWNERS

ggml/src/ggml-webgpu/ggml-webgpu.cpp

reeselevine · 2025-10-07T20:34:28Z

@CISC looks like the inflight threads checks fixed the deadlock issues! Thanks for your review of the logic. I'm going to remove the SERIALIZE option and then I'll merge.

* master: (113 commits) webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489) cpu : optimize the ggml NORM operation (ggml-org#15953) server : host-memory prompt caching (ggml-org#16391) No markdown in cot (ggml-org#16483) model-conversion : add support for SentenceTransformers (ggml-org#16387) ci: add ARM64 Kleidiai build and test support (ggml-org#16462) CANN: Improve ACL graph matching (ggml-org#16166) kleidiai: kernel interface refactoring (ggml-org#16460) [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472) model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367) refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394) Disable CUDA host buffers on integrated GPUs (ggml-org#16308) server : fix cancel pending task (ggml-org#16467) metal : mark FA blocks (ggml-org#16372) server : improve context checkpoint logic (ggml-org#16440) ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452) llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464) server : add `/v1/health` endpoint (ggml-org#16461) webui : added download action (ggml-org#13552) (ggml-org#16282) presets : fix pooling param for embedding models (ggml-org#16455) ...

reeselevine added 10 commits October 4, 2025 21:11

Add profiling

5ae936a

More detailed profiling

bd3d080

Rework command submission to avoid global locks

400a58d

Update wait handling

ca43faa

try new method of waiting on futures

98d98a2

Add serializing of command submission in some cases

26c44f8

Add new pool for timestamp queries and clean up logging

eabab9e

Serialize command submission in CI and leave a TODO note

c30b22e

Merge remote-tracking branch 'upstream/master'

168dd72

Update webgpu CI

b926b0c

reeselevine requested a review from CISC as a code owner October 7, 2025 03:47

github-actions bot added devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Oct 7, 2025

CISC approved these changes Oct 7, 2025

View reviewed changes

Add myself as WebGPU codeowner

d501ef4

Deadlock avoidance

d56e061

reeselevine force-pushed the master branch from 24291ef to d56e061 Compare October 7, 2025 19:34

reeselevine added 2 commits October 7, 2025 12:46

Leave WebGPU/Vulkan CI serialized

bcf8d9b

Fix divide by 0

8a848cb

CISC reviewed Oct 7, 2025

View reviewed changes

CODEOWNERS Outdated Show resolved Hide resolved

CISC reviewed Oct 7, 2025

View reviewed changes

ggml/src/ggml-webgpu/ggml-webgpu.cpp Outdated Show resolved Hide resolved

CISC reviewed Oct 7, 2025

View reviewed changes

ggml/src/ggml-webgpu/ggml-webgpu.cpp Outdated Show resolved Hide resolved

Fix logic in division by inflight_threads

d3c7ddd

Update CODEOWNERS and remove serialize submit option

a5e26c2

reeselevine merged commit 74b8fc1 into ggml-org:master Oct 7, 2025
61 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml webgpu: profiling, CI updates, reworking of command submission #16452

ggml webgpu: profiling, CI updates, reworking of command submission #16452

reeselevine commented Oct 7, 2025

Uh oh!

CISC commented Oct 7, 2025

Uh oh!

CISC commented Oct 7, 2025

Uh oh!

reeselevine commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

reeselevine commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml webgpu: profiling, CI updates, reworking of command submission #16452

ggml webgpu: profiling, CI updates, reworking of command submission #16452

Conversation

reeselevine commented Oct 7, 2025

Uh oh!

CISC commented Oct 7, 2025

Uh oh!

CISC commented Oct 7, 2025

Uh oh!

reeselevine commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

reeselevine commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants