Skip to content

Server CUDA Infill Segmentation Fault #6672

@kherud

Description

@kherud

With a CUDA build of the server, there is a segmentation fault possible when using the /infill endpoint.
I tested this with release b2667, but the problem seems to be present for at least 1-2 weeks.

The segmentation fault only seems to happen with models that don't support infilling (whatever that means), but the situation should probably handled more gracefully.

For example, CodeLlama-7B-GGUF does not produce a seg fault, but Mistral-7B-Instruct-v0.2-GGUF does.

Steps to reproduce:

System:

  • OS: Arch Linux
  • GPU: RTX 4090

Building the library:

mkdir build
cd build
cmake -DLLAMA_CUDA=ON -DLLAMA_CURL=ON ..
cmake --build . --config Release -j

Starting the server:

mkdir -p models/7B
./server -ngl 43 -mu https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q2_K.gguf

Making an infill request

curl --request POST \
--url http://localhost:8080/infill \
--header "Content-Type: application/json" \
--data '{                                                                                                                                                                              
    "input_prefix": "def remove_non_ascii(s: str) -> str:\n    \"\"\" ",
    "input_suffix": "\n    return result\n",
    "prompt": ""
}'

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggood first issueGood for newcomers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions