-
Notifications
You must be signed in to change notification settings - Fork 13k
Closed
Labels
Description
Info
Version: c12452c
Intel x86_64 with LLAMA_CUDA=1
Summary
Possibly similar to #7133. As I continue to goof my API calls I may keep stumbling upon these.
When ./server
is given a JSON payload at the /completions
route with a string-type system_prompt
field, server crashes with an abort. This denies access to clients until the server is restarted.
This may also affect other routes and fields.
Example
The server readme says that system_prompt
should be an object such as:
{
"system_prompt": {
"prompt": "Transcript of a never ending dialog, where the User interacts with an Assistant.\nThe Assistant is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.\nUser: Recommend a nice restaurant in the area.\nAssistant: I recommend the restaurant \"The Golden Duck\". It is a 5 star restaurant with a great view of the city. The food is delicious and the service is excellent. The prices are reasonable and the portions are generous. The restaurant is located at 123 Main Street, New York, NY 10001. The phone number is (212) 555-1234. The hours are Monday through Friday from 11:00 am to 10:00 pm. The restaurant is closed on Saturdays and Sundays.\nUser: Who is Richard Feynman?\nAssistant: Richard Feynman was an American physicist who is best known for his work in quantum mechanics and particle physics. He was awarded the Nobel Prize in Physics in 1965 for his contributions to the development of quantum electrodynamics. He was a popular lecturer and author, and he wrote several books, including \"Surely You're Joking, Mr. Feynman!\" and \"What Do You Care What Other People Think?\".\nUser:",
"anti_prompt": "User:",
"assistant_name": "Assistant:"
}
}
Giving a string causes an abort.
Debug build:
$ make clean && LLAMA_CUDA=1 LLAMA_DEBUG=1 make -j
$ gdb ./server
[... SNIP ...]
(gdb) r --model models/Meta-Llama-3-8B-Instruct.Q8_0.gguf --host 0.0.0.0
$ curl http://<REDACTED>/completions -H "Content-Type: application/json" --data '{"prompt": "", "n_predict": 16, "system_prompt": ""}'
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_3::detail::type_error'
what(): [json.exception.type_error.306] cannot use value() with string
Thread 1 "server" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
This reproduces in a non-debug build outside of gdb, crashing the server:
$ ./server --model models/Meta-Llama-3-8B-Instruct.Q8_0.gguf --host 0.0.0.0
[... SNIP ...]
{"tid":"140519807213568","timestamp":1715202496,"level":"INFO","function":"main","line":3785,"msg":"HTTP server listening","n_threads_http":"7","port":"8080","hostname":"0.0.0.0"}
{"tid":"140519807213568","timestamp":1715202496,"level":"INFO","function":"update_slots","line":1809,"msg":"all slots are idle"}
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_3::detail::type_error'
what(): [json.exception.type_error.306] cannot use value() with string
Aborted (core dumped)
Impact
Given a llama.cpp ./server
endpoint, it can at least be crashed using an invalid payload. This denies the availability of the server and all API endpoints until it is restarted.