-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed
Labels
Description
Name and Version
$ ./build/bin/llama-server --version
version: 6940 (c5023da)
built with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
- start llama-server with any model and a small context for testing
- make a new conversation at http://localhost:8080/ and chat until context is full
- switch to a new conversation and try to chat
What happens:
UI reports "Server Error
The server responded with an error message. Review the details below.
No response received from server. Please try again."
What should happen:
The second conversation should work.
First Bad Commit
commit b52edd25586fabb70f0c21b274473b307cf14499
Author: Georgi Gerganov <[email protected]>
Date: Thu Oct 30 18:42:57 2025 +0200
server : remove n_past (#16818)
Relevant log output
srv params_from_: Chat format: Content-only
slot get_availabl: id 3 | task -1 | selected slot by LCP similarity, sim_best = 0.333 (> 0.100 thold), f_keep = 0.030
srv get_availabl: updating prompt cache
srv prompt_save: - saving prompt with length 99, total state size = 10.830 MiB
srv alloc: - prompt is already in the cache, skipping
srv load: - looking for better prompt, base f_keep = 0.030, sim = 0.333
srv update: - cache state: 1 prompts, 10.830 MiB (limits: 8192.000 MiB, 100 tokens, 74885 est)
srv update: - prompt 0x55ad16cdda50: 99 tokens, checkpoints: 0, 10.830 MiB
srv get_availabl: prompt cache update took 0.04 ms
slot launch_slot_: id 3 | task 25 | processing task
srv send_error: task id = 25, error: context shift is disabled
slot release: id 3 | task 25 | stop processing: n_tokens = 99, truncated = 0
srv update_slots: no tokens to decode
srv update_slots: all slots are idle
srv cancel_tasks: cancel task, id_task = 25
srv update_slots: all slots are idle