Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Conversation

@tikikun
Copy link
Contributor

@tikikun tikikun commented Nov 3, 2023

** IMPORTANT

In order to enable continous batching (multi threading and multiple ccu) when load model need to enable cont_batching value

example

curl -X POST 'http://localhost:3928/inferences/llamacpp/loadModel' \
     -H 'Content-Type: application/json' \
     -d '{
          "llama_model_path": "/Users/alandao/Documents/codes/nitro.cpp_temp/models/llama2_7b_chat_uncensored.Q4_0.gguf",
          "ctx_len": 2048,
          "ngl": 100,
          "cont_batching": true
     }'

@tikikun tikikun self-assigned this Nov 3, 2023
@tikikun tikikun linked an issue Nov 3, 2023 that may be closed by this pull request
@tikikun tikikun merged commit d358274 into main Nov 6, 2023
@hiro-v hiro-v deleted the 41-feat-batch-inference-for-nitro branch November 20, 2023 16:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: batch inference for nitro

4 participants