-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Closed
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Description
What happened?
in commit 6e7d133a5f9409dd257fad90d7f320721b07a1b2
, changes were made to how the /v1/chat/completions
is handled.
ealier the call sequence was -
const int id_task = ctx_server.queue_tasks.get_new_id();
ctx_server.queue_results.add_waiting_task_id(id_task);
ctx_server.request_completion(id_task, -1, data, false, false);
...
ctx_server.queue_results.remove_waiting_task_id(id_task);
after the changes, the ctx_server.queue_results.remove_waiting_task_id(id_task);
call is missing and is causing the server_response.waiting_task_ids
to increase after serving every call to the above endpoint. if there is a long running instance of llama-server in production, this will end up consuming a lot of memory as the ids are not cleared for a few refactored server handlers.
@ngxson kindly provide your inputs. tx.
Name and Version
$ ./bin/llama-cli --version
version: 3609 (2f3c1466)
built with Homebrew clang version 18.1.5 for arm64-apple-darwin23.3.0
What operating system are you seeing the problem on?
Mac
Relevant log output
No response
Metadata
Metadata
Assignees
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)