Skip to content

Eval bug: Unable to run Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL #14915

@ambud

Description

@ambud

Name and Version

./llama.cpp/build/bin/llama-cli  --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
version: 5967 (6c88b3bb)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

AMD EPYC 7402 + 4x4060 8GB

Models

unsloth/Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL

Problem description & steps to reproduce

Unable to run Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL

Following
https://docs.unsloth.ai/basics/qwen3-coder-how-to-run-locally

Related issues
QwenLM/qwen-code#48 [indicates that the issue is with llama.cpp template]
QwenLM/Qwen3-Coder#434

./llama.cpp/build/bin/llama-server     --model unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/UD-Q2_K_XL/Qwen3-Coder-480B-A35B-Instruct-UD-Q2_K_XL-00001-of-00004.gguf   \
 --threads -1     --ctx-size 20000          --temp 0.7     --min-p 0.0     --top-p 0.8     --top-k 20  \
   --repeat-penalty 1.05 --n-gpu-layers 200 -ot ".ffn_(up|down)_exps.=CPU" --jinja

First Bad Commit

No response

Relevant log output

I'll create a simple "Hello, World!" program for you. Since you didn't specify a language, I'll
  write it in Python, which is commonly used for beginners.

  [tool_call: write_file for file_path '/Users/username/code/test/hello_world.py' with content
  'print("Hello, World!")']

Instead of executing the tool it is simply printing it.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions