Skip to content

Commit e703e04

Browse files
authored
Updated llama-server command for a bigger context length
Added the `-c` parameter to `llama-server` command in order to increase the context length to 4096 tokens, from the default of 2048 tokens.
1 parent 564fa63 commit e703e04

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ python optillm.py
6262

6363
- Set the `OPENAI_API_KEY` env variable to a placeholder value
6464
- e.g. `export OPENAI_API_KEY="no_key"`
65-
- Run `./llama-server -m path_to_model` to start the server with the specified model
65+
- Run `./llama-server -c 4096 -m path_to_model` to start the server with the specified model and a context length of 4096 tokens
6666
- Run `python3 optillm.py --base_url base_url` to start the proxy
6767
- e.g. for llama.cpp, run `python3 optillm.py --base_url http://localhost:8080/v1`
6868

0 commit comments

Comments
 (0)