This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Description
Goal
- Cortex can handle all llama.cpp params correctly
- Model running params (i.e. POST
/v1/models/<model_id>/start)
- Inference params (i.e. POST
/chat/completions)
- Function Calling, eg for llama.cpp
Tasklist
I am using this epic to aggregate all llama.cpp params issues, including llama3.1 function calling + tool use
model.yaml
model.yaml as optional? (i.e. depend on GGUF params)
model.yaml should be well documented with approrpriate naming conventions
- Model loading params?
- Inference params?
- To exclude engine params?
Out-of-scope:
Related