-
Notifications
You must be signed in to change notification settings - Fork 13.3k
webgpu : fix build on emscripten #15826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
.ccache/ | ||
|
||
# emscripten | ||
a.out.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just a.out*
?
@ggerganov @slaren Quick question, I'm building So I'm wondering, is there any ways to completely disable threadpool? Edit: I'm referring to this code: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c Lines 3124 to 3130 in c4df49a
|
I think a threadpool is currently required - don't think there is an easy workaround. @max-krasnyansky any thoughts? |
Hmm ok that means both wllama and whisper.cpp single-thread wasm builds are currently broken. Having single-thread support would be nice, but it's not urgent though. |
Yes, we should support to launch a single-thread compute without invoking synchronization primitives and spawning threads so that thread-less WASM works. Shouldn't be hard to implement. Looking at the implementation, I think almost everything is inplace for that. Where does the single-thread WASM fail when you call ggml compute with |
It currently fails at this line: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c Line 3091 in c4df49a
I don't have the stack trace due to some difficulty debugging in-browser, but it's very likely invoked by this line: llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c Line 3129 in c4df49a
Where we try to create a threadpool of one single thread |
Also just want to note that atomic ops like |
To enter that loop, it would mean that for (int j = 1; j < tpp->n_threads; j++) {
ggml_thread_cpumask_next(tpp->cpumask, workers[j].cpumask, tpp->strict_cpu, &cpumask_iter);
int32_t rc = ggml_thread_create(&workers[j].thrd, NULL, ggml_graph_compute_secondary_thread, &workers[j]);
GGML_ASSERT(rc == 0);
}
Could the calling program be using |
Yeah you're right, the |
What's the status on this PR, and more generally, potential integration of WebGPU with wllama @ngxson? When #16400 is merged the WebGPU backend should be able to run all the operations for a decent number of text-generation models (excepting flash attention, but my understanding is it falls back to standard multi-kernel attention for now). There's still some work to do on optimizing the existing shaders for it to really work well, but it would be great to also start getting it ready in a demo form on the browser. Happy to help work on integration if you'd like too. |
hey @reeselevine I still have some weird issues where running this can cause the browser to hang. need to investigate this a bit more. |
Ref original webgpu PR: #14978
Example command:
# install emscripten: brew install emscripten emcmake cmake -B build-wasm -DGGML_WEBGPU=ON -DLLAMA_CURL=OFF -DGGML_WEBGPU_DEBUG=ON cmake --build build-wasm --target test-backend-ops