-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Description
An error occur when the prompt is longer than a few tokens. Launch arguments:
#!/bin/bash
export LD_LIBRARY_PATH=/home/tug/Desktop/bin/llama.cpp/build:$LD_LIBRARY_PATH
export HIP_VISIBLE_DEVICES=0
cd "/home/tug/Desktop/bin/llama.cpp/build"
numactl -N 0 -m 0
./llama-server
--n-gpu-layers 99
--threads 40
--threads-batch 40
--ctx-size 35000
--batch-size 2048
-ub 510
--override-tensor exps=CPU
--host 0.0.0.0
--port 8080
-fa on
--jinja
--model "/media/tug/AI NVMe/MODELS/DeepSeek-V3.1-Q4_0/DeepSeek-V3.1-Q4_0-00001-of-00008.gguf"
read -p "Press ENTER to close..."
Name and Version
llama-server b6399
version: 0 (unknown)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
HIP
Hardware
MI100 gfx908 Xeon 2300 X2
Models
DeepSeek-V3.1-Q4_0
Problem description & steps to reproduce
Work with small user prompt but crash when longer.
First Bad Commit
No response
Relevant log output
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 35008, n_keep = 0, n_prompt_tokens = 417
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 417, n_tokens = 417, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 417, n_tokens = 417
/shared/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:87: ROCm error
ROCm error: CUBLAS_STATUS_NOT_SUPPORTED
current device: 0, in function ggml_cuda_op_mul_mat_cublas at /shared/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:1302
hipblasGemmEx(ctx.cublas_handle(id), HIPBLAS_OP_T, HIPBLAS_OP_N, row_diff, src1_ncols, ne10, &alpha, src0_ptr, HIPBLAS_R_16F, ne00, src1_ptr, HIPBLAS_R_16F, ne10, &beta, dst_dd_i, HIPBLAS_R_32F, ldc, HIPBLAS_R_32F, HIPBLAS_GEMM_DEFAULT)
[New LWP 947059]
[New LWP 947058]
[New LWP 947057]
[New LWP 947056]
[New LWP 947055]
[New LWP 947054]
[New LWP 947053]
[New LWP 947052]
[New LWP 947051]
[New LWP 947050]
[New LWP 947049]
[New LWP 947048]
[New LWP 947047]
[New LWP 947046]
[New LWP 947045]
[New LWP 947044]
[New LWP 947043]
[New LWP 947042]
[New LWP 947041]
[New LWP 947040]
[New LWP 947039]
[New LWP 947038]
[New LWP 947037]
[New LWP 947036]
[New LWP 947035]
[New LWP 947034]
[New LWP 947033]
[New LWP 947032]
[New LWP 947031]
[New LWP 947030]
[New LWP 947029]
[New LWP 947028]
[New LWP 947027]
[New LWP 947026]
[New LWP 947025]
[New LWP 947024]
[New LWP 947023]
[New LWP 947022]
[New LWP 947021]
[New LWP 946900]
[New LWP 945802]
[New LWP 945801]
[New LWP 945800]
[New LWP 945799]
[New LWP 945798]
[New LWP 945797]
[New LWP 945796]
[New LWP 945795]
[New LWP 945794]
[New LWP 945793]
[New LWP 945792]
[New LWP 945791]
[New LWP 945790]
[New LWP 945789]
[New LWP 945788]
[New LWP 945787]
[New LWP 945786]
[New LWP 945785]
[New LWP 945784]
[New LWP 945783]
[New LWP 945782]
[New LWP 945781]
[New LWP 945780]
[New LWP 945779]
[New LWP 945778]
[New LWP 945777]
[New LWP 945776]
[New LWP 945775]
[New LWP 945774]
[New LWP 945773]
[New LWP 945772]
[New LWP 945771]
[New LWP 945770]
[New LWP 945769]
[New LWP 945768]
[New LWP 945767]
[New LWP 945766]
[New LWP 945765]
[New LWP 945764]
[New LWP 945763]
[New LWP 945762]
[New LWP 945761]
[New LWP 945760]
[New LWP 945759]
[New LWP 945758]
[New LWP 945757]
[New LWP 945756]
[New LWP 945755]
[New LWP 945754]
[New LWP 945753]
[New LWP 945752]
[New LWP 945751]
[New LWP 945750]
[New LWP 945749]
[New LWP 945748]
[New LWP 945747]
[New LWP 945746]
[New LWP 945745]
[New LWP 945744]
[New LWP 945743]
[New LWP 945742]
[New LWP 945741]
[New LWP 945740]
[New LWP 945739]
[New LWP 945738]
[New LWP 945737]
[New LWP 945736]
[New LWP 945735]
[New LWP 945734]
[New LWP 945733]
[New LWP 945732]
[New LWP 945731]
[New LWP 945730]
[New LWP 945729]
[New LWP 945728]
[New LWP 945727]
[New LWP 945726]
[New LWP 945725]
[New LWP 945724]
[New LWP 945723]
[New LWP 945722]
[New LWP 945406]
This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.ubuntu.com
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/liblber.so.2
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlidec.so.1
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlicommon.so.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000079a095f107e3 in __GI___wait4 (pid=950067, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30 ../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#0 0x000079a095f107e3 in __GI___wait4 (pid=950067, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30 in ../sysdeps/unix/sysv/linux/wait4.c
#1 0x000079a0965715f3 in ggml_print_backtrace () from libggml-base.so
#2 0x000079a09657179b in ggml_abort () from libggml-base.so
#3 0x000079a09153ad62 in ggml_cuda_error(char const*, char const*, char const*, int, char const*) () from libggml-hip.so
#4 0x000079a091549b95 in ggml_cuda_op_mul_mat_cublas(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t*) () from libggml-hip.so
#5 0x000079a091547fea in ggml_cuda_op_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, void ()(ggml_backend_cuda_context&, ggml_tensor const, ggml_tensor const*, ggml_tensor*, char const*, float const*, char const*, float*, long, long, long, long, ihipStream_t*), void ()(float const, int const*, void*, ggml_type, long, long, long, long, long, long, long, long, ihipStream_t*)) () from libggml-hip.so
#6 0x000079a091542d86 in ggml_cuda_mul_mat(ggml_backend_cuda_context&, ggml_tensor const*, ggml_tensor const*, ggml_tensor*) () from libggml-hip.so
#7 0x000079a091540bad in ggml_backend_cuda_graph_compute(ggml_backend*, ggml_cgraph*) () from libggml-hip.so
#8 0x000079a09658be07 in ggml_backend_sched_graph_compute_async () from libggml-base.so
#9 0x000079a09669e591 in llama_context::graph_compute(ggml_cgraph*, bool) () from libllama.so
#10 0x000079a09669f994 in llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) () from libllama.so
#11 0x000079a0966a5c6d in llama_context::decode(llama_batch const&) () from libllama.so
#12 0x000079a0966a6baf in llama_decode () from libllama.so
#13 0x000059852106b2a2 in server_context::update_slots() ()
#14 0x00005985210317ac in server_queue::start_loop() ()
#15 0x0000598520ff545b in main ()
[Inferior 1 (process 945365) detached]
/home/tug/Desktop/R1V3HIP.sh: line 20: 945365 Aborted (core dumped) numactl -N 0 -m 0 ./llama-server --n-gpu-layers 99 --threads 40 --threads-batch 40 --ctx-size 35000 --batch-size 2048 -ub 510 --override-tensor exps=CPU --host 0.0.0.0 --port 8080 -fa off --jinja --model "/media/tug/AI NVMe/MODELS/DeepSeek-V3.1-Q4_0/DeepSeek-V3.1-Q4_0-00001-of-00008.gguf"
Compiled with:
make -S . -B build
-DGGML_HIP=ON
-DGPU_TARGETS="gfx908"
-DCMAKE_BUILD_TYPE=Release
-Dhipblas_DIR=$HIPBLAS_DIR
-DLLAMA_CURL=OFF
cmake --build build --config Release -- -j16