-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed
Labels
Description
Name and Version
master
Operating systems
Linux
GGML backends
Vulkan
Hardware
main system where I did the git bisection:
ggml_vulkan: 0 = Intel(R) Iris(R) Xe Graphics (TGL GT2) (Intel open-source Mesa driver)
also having issues on this system:
ggml_vulkan: 0 = Intel(R) UHD Graphics 630 (CFL GT2) (Intel open-source Mesa driver)
Models
smollm:135m
Problem description & steps to reproduce
When I run llama-run (same with llama-serve), the inference either crashes or outputs garbage.
../build.vulkan-linux/bin/llama-run ~/models/smollm:135m "say nothing" --ngl 99 --verbose
First Bad Commit
Relevant log output
git reset --hard b6874
# rebuild
../build.vulkan-linux/bin/llama-run ~/models/smollm:135m "say nothing" --ngl 99 --verbose
<answers>
git reset --hard b6875
# rebuild --> compilation takes forever...
git reset --hard b6876
# rebuild
../build.vulkan-linux/bin/llama-run ~/models/smollm:135m "say nothing" --ngl 99 --verbose
llama_context: Vulkan0 compute buffer size = 98.25 MiB
llama_context: Vulkan_Host compute buffer size = 5.14 MiB
llama_context: graph nodes = 937
llama_context: graph splits = 2
llama-run: /var/home/kpouget/pod-virt/remoting/linux-work/llama_cpp/src/src/llama-sampling.cpp:662: void llama_sampler_dist_apply(llama_sampler*, llama_token_data_array*): Assertion `found' failed.
./run.linux.sh: line 1: 97210 Aborted
git reset --hard b6969
# rebuild
llama_context: Vulkan0 compute buffer size = 98.25 MiB
llama_context: Vulkan_Host compute buffer size = 5.14 MiB
llama_context: graph nodes = 937
llama_context: graph splits = 2
llama-run: /var/home/kpouget/pod-virt/remoting/linux-work/llama_cpp/src/src/llama-sampling.cpp:662: void llama_sampler_dist_apply(llama_sampler*, llama_token_data_array*): Assertion `found' failed.
./run.linux.sh: line 1: 108491 Aborted