-
Notifications
You must be signed in to change notification settings - Fork 13k
Closed
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomers
Description
With a CUDA build of the server, there is a segmentation fault possible when using the /infill
endpoint.
I tested this with release b2667
, but the problem seems to be present for at least 1-2 weeks.
The segmentation fault only seems to happen with models that don't support infilling (whatever that means), but the situation should probably handled more gracefully.
For example, CodeLlama-7B-GGUF does not produce a seg fault, but Mistral-7B-Instruct-v0.2-GGUF does.
Steps to reproduce:
System:
- OS: Arch Linux
- GPU: RTX 4090
Building the library:
mkdir build
cd build
cmake -DLLAMA_CUDA=ON -DLLAMA_CURL=ON ..
cmake --build . --config Release -j
Starting the server:
mkdir -p models/7B
./server -ngl 43 -mu https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q2_K.gguf
Making an infill request
curl --request POST \
--url http://localhost:8080/infill \
--header "Content-Type: application/json" \
--data '{
"input_prefix": "def remove_non_ascii(s: str) -> str:\n \"\"\" ",
"input_suffix": "\n return result\n",
"prompt": ""
}'
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomers