-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
Name and Version
pwilkin@SYN-PC-11:/devel/models$ llama-cli --version
load_backend: loaded BLAS backend from /devel/tools/llama.cpp/build/bin/libggml-blas.so
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (OpenBLAS)
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
Device 1: NVIDIA GeForce RTX 5070 Ti, compute capability 12.0, VMM: yes
load_backend: loaded CUDA backend from /devel/tools/llama.cpp/build/bin/libggml-cuda.so
register_backend: registered backend CUDA (2 devices)
register_device: registered device CUDA0 (NVIDIA GeForce RTX 5070 Ti)
register_device: registered device CUDA1 (NVIDIA GeForce RTX 5070 Ti)
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-alderlake.so score: 128
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-icelake.so score: 0
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-haswell.so score: 64
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-skylakex.so score: 0
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sapphirerapids.so score: 0
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sandybridge.so score: 21
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-sse42.so score: 5
ggml_backend_load_best: /devel/tools/llama.cpp/build/bin/libggml-cpu-x64.so score: 1
load_backend: loaded CPU backend from /devel/tools/llama.cpp/build/bin/libggml-cpu-alderlake.so
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Core(TM) i7-14700KF)
version: 6921 (eca77bf)
built with cc (Ubuntu 15.2.0-4ubuntu4) 15.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
2x 5070 Ti
Models
Minimax-M2
Problem description & steps to reproduce
This runs complex queries without problems:
llama-server -m MiniMaxAI_MiniMax-M2-IQ3_M/MiniMaxAI_MiniMax-M2-IQ3_M-00001-of-00003.gguf -ngl 99 --cpu-moe --host 0.0.0.0 -c 50000 -fa on --alias syndatis --threads 24 --chat-template-file /devel/tools/llama.cpp/models/templates/unsloth-MiniMax-M2.jinja --jinjaThis starts generating corrupted outputs with any prompt of non-trivial size (say 500):
llama-server -m MiniMaxAI_MiniMax-M2-IQ3_M/MiniMaxAI_MiniMax-M2-IQ3_M-00001-of-00003.gguf -ngl 99 -ot "\.([0-9]|[0-5][0-9]|5[0-3])\.ffn_.*_exps=CPU,blk.5[4-8].*=CUDA0,blk.(6[0-2]|5[8-9]).*=CUDA1" --host 0.0.0.0 -c 50000 -fa on --alias syndatis --threads 24 --chat-template-file /devel/tools/llama.cpp/models/templates/unsloth-MiniMax-M2.jinja --jinjaPossibly related to #16935
First Bad Commit
No response
Relevant log output
N/A