-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed as duplicate of#13812
Closed as duplicate of#13812
Copy link
Labels
Description
Name and Version
version: 5528 (53ae306)
built with MSVC 19.43.34810.0 for x64
Operating systems
Windows
GGML backends
Vulkan
Hardware
9070xt, A770
Models
Model Qwen3-30B-A3B-UD-Q5_K_XL.gguf with fixed chat templates
SHA256: f284af35140194f073985a093f6d257cb7060784ecbfeb52c15f9545dfa4f434
Problem description & steps to reproduce
llama-server.exe -m Qwen3-30B-A3B-UD-Q5_K_XL.gguf -ngl 99 -c 15000 --port 8000 --jinja
Server silently terminates in some dialogues, typically after 2-3 requests within a single dialogue.
The Windows Event Log records a crash event for llama-server.exe, with ucrtbased.dll as the faulting module.
First Bad Commit
b5486
Relevant log output
srv update_chat_: Parsing chat message: <think>
Parsing input with format Hermes 2 Pro: <think>
Partial parse: </think>
Parsed message: {"role":"assistant","content":null}
srv send: sending result for task id = 0
srv send: task id = 0 pushed to result queue
slot process_toke: id 0 | task 0 | n_decoded = 2, n_remaining = -1, next token: 198 '
'
srv update_slots: run slots completed
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 2
que start_loop: update slots
srv update_slots: posting NEXT_RESPONSE
que post: new task, id = 3, front = 0
slot update_slots: id 0 | task 0 | slot decode token, n_ctx = 15008, n_past = 903, n_cache_tokens = 903, truncated = 0
srv update_slots: decoding batch, n_tokens = 1
set_embeddings: value = 0
clear_adapter_lora: call
srv update_chat_: Parsing chat message: <think>
Х
Parsing input with format Hermes 2 Pro: <think>
Х
teleprint-me
