No variance in response from /chat/completions

Hi,

I'm playing around with the temperature property when calling a model from the `/chat/completions` API, but I can't figure out how how to get some variance in the responses. I have the temperature set to 0.8. 

Here's how I start the server:

```bash
./llamafile-server-0.4 -m models/starling-lm-7b-alpha.Q4_K_M.gguf --nobrowser
```

And this is the way I call the API:

```bash
 curl -s http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
  "model": "gpt-3.5-turbo", "temperature": 0.8,
  "messages": [
    {
      "role": "system",
      "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."
    },
    {
      "role": "user",
      "content": "Compose a poem that explains the concept of recursion in programming. A maximum of 5 lines"
    }
  ]
}' 2>/dev/null | jq -r '.choices[].message.content'
```

And below is a video showing the same output 3 times.

https://github.com/Mozilla-Ocho/llamafile/assets/13220/3c1a1ea4-ee6e-419d-8481-d459fb659f5b



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No variance in response from /chat/completions #117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No variance in response from /chat/completions #117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions