From 1868b11a22ad01d3378a443a8d1fe8dd9c6ae0b5 Mon Sep 17 00:00:00 2001 From: Asankhaya Sharma Date: Sat, 25 Jan 2025 05:24:23 +0800 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 83e7f618..062f065d 100644 --- a/README.md +++ b/README.md @@ -216,7 +216,7 @@ response = client.chat.completions.create( - e.g. for llama.cpp, run `python3 optillm.py --base_url http://localhost:8080/v1` > [!WARNING] -> Note that the Anthropic API, llama-server (and ollama) currently does not support sampling multiple responses from a model, which limits the available approaches to the following: +> The Anthropic API, llama.cpp-server, and ollama currently do not support sampling multiple responses from a model, which limits the available approaches to the following: > `cot_reflection`, `leap`, `plansearch`, `rstar`, `rto`, `self_consistency`, `re2`, and `z3`. For models on HuggingFace, you can use the built-in local inference server as it supports multiple responses. ## Implemented techniques