Skip to content

Conversation

reeselevine
Copy link
Collaborator

This PR adds:

  • Optional profiling for the host (CPU) and device (GPU) sides, which will be helpful when working on optimization.
  • Bumps WebGPU Ubuntu CI testing to use 24.04 to match Vulkan
  • Reworks command submission to avoid global locks on pushing to staging vectors, and instead have each thread submit work in graph_compute.
  • As part of that work, I was running into issues where deadlock was occurring, either because the parameter buffer pool was exhausted, or in some cases due to what I believe are underlying issues in Dawn's thread-safety approach. Serializing each command submission seems to avoid issues, so added options for that, and left a TODO in the code to explore other solutions.
  • For now, I'm comfortable leaving it as is, because one of the next major goals is getting the WebGPU backend running fully in the browser, which will likely be in a single-threaded context to start, as mentioned in webgpu : fix build on emscripten #15826.

@reeselevine reeselevine requested a review from CISC as a code owner October 7, 2025 03:47
@github-actions github-actions bot added devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning labels Oct 7, 2025
@CISC
Copy link
Collaborator

CISC commented Oct 7, 2025

BTW, consider adding yourself to CODEOWNERS.

@CISC
Copy link
Collaborator

CISC commented Oct 7, 2025

@reeselevine I think you have the ability, merge at will.

@reeselevine
Copy link
Collaborator Author

@CISC yeah I will, I was just at the gym this morning and thought of a potential solution for my deadlock problems, so trying a couple commits to fix that, then I will merge!

@reeselevine
Copy link
Collaborator Author

@CISC looks like the inflight threads checks fixed the deadlock issues! Thanks for your review of the logic. I'm going to remove the SERIALIZE option and then I'll merge.

@reeselevine reeselevine merged commit 74b8fc1 into ggml-org:master Oct 7, 2025
61 checks passed
anyshu pushed a commit to anyshu/llama.cpp that referenced this pull request Oct 10, 2025
* master: (113 commits)
  webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489)
  cpu : optimize the ggml NORM operation (ggml-org#15953)
  server : host-memory prompt caching (ggml-org#16391)
  No markdown in cot (ggml-org#16483)
  model-conversion : add support for SentenceTransformers (ggml-org#16387)
  ci: add ARM64 Kleidiai build and test support (ggml-org#16462)
  CANN: Improve ACL graph matching (ggml-org#16166)
  kleidiai: kernel interface refactoring (ggml-org#16460)
  [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472)
  model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367)
  refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394)
  Disable CUDA host buffers on integrated GPUs (ggml-org#16308)
  server : fix cancel pending task (ggml-org#16467)
  metal : mark FA blocks (ggml-org#16372)
  server : improve context checkpoint logic (ggml-org#16440)
  ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452)
  llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464)
  server : add `/v1/health` endpoint (ggml-org#16461)
  webui : added download action (ggml-org#13552) (ggml-org#16282)
  presets : fix pooling param for embedding models (ggml-org#16455)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants