From 1868b11a22ad01d3378a443a8d1fe8dd9c6ae0b5 Mon Sep 17 00:00:00 2001
From: Asankhaya Sharma <codelion@users.noreply.github.com>
Date: Sat, 25 Jan 2025 05:24:23 +0800
Subject: [PATCH] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 83e7f618..062f065d 100644
--- a/README.md
+++ b/README.md
@@ -216,7 +216,7 @@ response = client.chat.completions.create(
   - e.g. for llama.cpp, run `python3 optillm.py --base_url http://localhost:8080/v1`
 
 > [!WARNING]
-> Note that the Anthropic API, llama-server (and ollama) currently does not support sampling multiple responses from a model, which limits the available approaches to the following:
+> The Anthropic API, llama.cpp-server, and ollama currently do not support sampling multiple responses from a model, which limits the available approaches to the following:
 > `cot_reflection`, `leap`, `plansearch`, `rstar`, `rto`, `self_consistency`, `re2`, and `z3`. For models on HuggingFace, you can use the built-in local inference server as it supports multiple responses.
 
 ## Implemented techniques