Skip to content

Commit 22c9fc4

Browse files
authored
Merge pull request #27 from LuMarans30/main
Added local server setup to `installation` section
2 parents b9b7f95 + f7ad745 commit 22c9fc4

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,22 @@ python optillm.py
5858
2024-09-06 07:57:14,212 - INFO - Press CTRL+C to quit
5959
```
6060

61+
### Starting the optillm proxy for a local server (e.g. llama.cpp)
62+
63+
- Set the `OPENAI_API_KEY` env variable to a placeholder value
64+
- e.g. `export OPENAI_API_KEY="no_key"`
65+
- Run `./llama-server -c 4096 -m path_to_model` to start the server with the specified model and a context length of 4096 tokens
66+
- Run `python3 optillm.py --base_url base_url` to start the proxy
67+
- e.g. for llama.cpp, run `python3 optillm.py --base_url http://localhost:8080/v1`
68+
69+
> [!WARNING]
70+
> Note that llama-server currently does not support sampling multiple responses from a model, which limits the available approaches to the following:
71+
> `cot_reflection`, `leap`, `plansearch`, `rstar`, `rto`, `self_consistency`, and `z3`.
72+
> In order to use other approaches, consider using an alternative compatible server such as [ollama](https://github.com/ollama/ollama).
73+
74+
> [!NOTE]
75+
> You'll later need to specify a model name in the OpenAI client configuration. Since llama-server was started with a single model, you can choose any name you want.
76+
6177
## Usage
6278

6379
Once the proxy is running, you can use it as a drop in replacement for an OpenAI client by setting the `base_url` as `http://localhost:8000/v1`.

0 commit comments

Comments
 (0)