abetlen · abetlen · Apr 30, 2024 · Jan 31, 2024 · Apr 27, 2024 · Apr 27, 2024
diff --git a/README.md b/README.md
@@ -490,14 +490,15 @@ Due to discrepancies between llama.cpp and HuggingFace's tokenizers, it is requi
 
 ### Multi-modal Models
 
-`llama-cpp-python` supports the llava1.5 family of multi-modal models which allow the language model to
-read information from both text and images.
+`llama-cpp-python` supports such as llava1.5 which allow the language model to read information from both text and images.
 
 You'll first need to download one of the available multi-modal models in GGUF format:
 
 - [llava-v1.5-7b](https://huggingface.co/mys/ggml_llava-v1.5-7b)
 - [llava-v1.5-13b](https://huggingface.co/mys/ggml_llava-v1.5-13b)
 - [bakllava-1-7b](https://huggingface.co/mys/ggml_bakllava-1)
+- [llava-v1.6-34b](https://huggingface.co/cjpais/llava-v1.6-34B-gguf)
+- [moondream2](https://huggingface.co/vikhyatk/moondream2)
 
 Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.
 
@@ -509,22 +510,52 @@ Then you'll need to use a custom chat handler to load the clip model and process
   model_path="./path/to/llava/llama-model.gguf",
   chat_handler=chat_handler,
   n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
-  logits_all=True,# needed to make llava work
 )
 >>> llm.create_chat_completion(
     messages = [
         {"role": "system", "content": "You are an assistant who perfectly describes images."},
         {
             "role": "user",
             "content": [
-                {"type": "image_url", "image_url": {"url": "https://.../image.png"}},
-                {"type" : "text", "text": "Describe this image in detail please."}
+                {"type" : "text", "text": "What's in this image?"},
+                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } }
             ]
         }
     ]
 )
 ```
 
+You can also pull the model from the Hugging Face Hub using the `from_pretrained` method.
+
+```python
+>>> from llama_cpp import Llama
+>>> from llama_cpp.llama_chat_format import MoondreamChatHandler
+>>> chat_handler = MoondreamChatHandler.from_pretrained(
+  repo_id="vikhyatk/moondream2",
+  filename="*mmproj*",
+)
+>>> llm = Llama.from_pretrained(
+  repo_id="vikhyatk/moondream2"
+  filename="*text-model*",
+  chat_handler=chat_handler,
+  n_ctx=2048, # n_ctx should be increased to accomodate the image embedding
+)
+>>> llm.create_chat_completion(
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {"type" : "text", "text": "What's in this image?"},
+                {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } }
+
+            ]
+        }
+    ]
+)
+```
+
+**Note**: Multi-modal models also support tool calling and JSON mode.
+
 <details>
 <summary>Loading a Local Image</summary>
 

diff --git a/docs/server.md b/docs/server.md
@@ -98,6 +98,8 @@ You'll first need to download one of the available multi-modal models in GGUF fo
 - [llava-v1.5-7b](https://huggingface.co/mys/ggml_llava-v1.5-7b)
 - [llava-v1.5-13b](https://huggingface.co/mys/ggml_llava-v1.5-13b)
 - [bakllava-1-7b](https://huggingface.co/mys/ggml_bakllava-1)
+- [llava-v1.6-34b](https://huggingface.co/cjpais/llava-v1.6-34B-gguf)
+- [moondream2](https://huggingface.co/vikhyatk/moondream2)
 
 Then when you run the server you'll need to also specify the path to the clip model used for image embedding and the `llava-1-5` chat_format