Server CUDA Infill Segmentation Fault

With a CUDA build of the server, there is a segmentation fault possible when using the `/infill` endpoint.
I tested this with release `b2667`, but the problem seems to be present for at least 1-2 weeks.

The segmentation fault only seems to happen with models that don't support infilling (whatever that means), but the situation should probably handled more gracefully.

For example, [CodeLlama-7B-GGUF](https://huggingface.co/TheBloke/CodeLlama-7B-GGUF) does not produce a seg fault, but [Mistral-7B-Instruct-v0.2-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF) does.


## Steps to reproduce:

System:
- OS: Arch Linux
- GPU: RTX 4090

### Building the library:

```
mkdir build
cd build
cmake -DLLAMA_CUDA=ON -DLLAMA_CURL=ON ..
cmake --build . --config Release -j
```

### Starting the server:

```
mkdir -p models/7B
./server -ngl 43 -mu https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q2_K.gguf
```

### Making an infill request

```
curl --request POST \
--url http://localhost:8080/infill \
--header "Content-Type: application/json" \
--data '{                                                                                                                                                                              
    "input_prefix": "def remove_non_ascii(s: str) -> str:\n    \"\"\" ",
    "input_suffix": "\n    return result\n",
    "prompt": ""
}'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server CUDA Infill Segmentation Fault #6672

Steps to reproduce:

Building the library:

Starting the server:

Making an infill request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server CUDA Infill Segmentation Fault #6672

Description

Steps to reproduce:

Building the library:

Starting the server:

Making an infill request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions