-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Closed
Labels
Description
Name and Version
./llama.cpp/build/bin/llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
version: 5967 (6c88b3bb)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
AMD EPYC 7402 + 4x4060 8GB
Models
unsloth/Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL
Problem description & steps to reproduce
Unable to run Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL
Following
https://docs.unsloth.ai/basics/qwen3-coder-how-to-run-locally
Related issues
QwenLM/qwen-code#48 [indicates that the issue is with llama.cpp template]
QwenLM/Qwen3-Coder#434
./llama.cpp/build/bin/llama-server --model unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/UD-Q2_K_XL/Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL-00001-of-00004.gguf \
--threads -1 --ctx-size 20000 --temp 0.7 --min-p 0.0 --top-p 0.8 --top-k 20 \
--repeat-penalty 1.05 --n-gpu-layers 200 -ot ".ffn_(up|down)_exps.=CPU" --jinja
First Bad Commit
No response
Relevant log output
I'll create a simple "Hello, World!" program for you. Since you didn't specify a language, I'll
write it in Python, which is commonly used for beginners.
[tool_call: write_file for file_path '/Users/username/code/test/hello_world.py' with content
'print("Hello, World!")']
Instead of executing the tool it is simply printing it.
n00b001, kush-gupt and mikestaub