Skip to content

Eval bug: llama-server.exe silently crashes (ucrtbased.dll) after 2-3 requests in a dialogue #13877

@characharm

Description

@characharm

Name and Version

version: 5528 (53ae306)
built with MSVC 19.43.34810.0 for x64

Operating systems

Windows

GGML backends

Vulkan

Hardware

9070xt, A770

Models

Model Qwen3-30B-A3B-UD-Q5_K_XL.gguf with fixed chat templates
SHA256: f284af35140194f073985a093f6d257cb7060784ecbfeb52c15f9545dfa4f434

Problem description & steps to reproduce

llama-server.exe -m Qwen3-30B-A3B-UD-Q5_K_XL.gguf -ngl 99 -c 15000 --port 8000 --jinja

Server silently terminates in some dialogues, typically after 2-3 requests within a single dialogue.
The Windows Event Log records a crash event for llama-server.exe, with ucrtbased.dll as the faulting module.

First Bad Commit

e121edc

b5486

Relevant log output

srv  update_chat_: Parsing chat message: <think>

Parsing input with format Hermes 2 Pro: <think>

Partial parse: </think>
Parsed message: {"role":"assistant","content":null}
srv          send: sending result for task id = 0
srv          send: task id = 0 pushed to result queue
slot process_toke: id  0 | task 0 | n_decoded = 2, n_remaining = -1, next token:   198 '
'
srv  update_slots: run slots completed
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 2
que    start_loop: update slots
srv  update_slots: posting NEXT_RESPONSE
que          post: new task, id = 3, front = 0
slot update_slots: id  0 | task 0 | slot decode token, n_ctx = 15008, n_past = 903, n_cache_tokens = 903, truncated = 0
srv  update_slots: decoding batch, n_tokens = 1
set_embeddings: value = 0
clear_adapter_lora: call
srv  update_chat_: Parsing chat message: <think>
Х
Parsing input with format Hermes 2 Pro: <think>
Х


Image

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions