Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit c5b4544

Browse files
authored
Update the cortex-cpp Readme (#1028)
1 parent fa72355 commit c5b4544

File tree

2 files changed

+206
-84
lines changed

2 files changed

+206
-84
lines changed

README.md

Lines changed: 206 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -4,77 +4,42 @@
44
</p>
55

66
<p align="center">
7-
<a href="https://jan.ai/cortex">Documentation</a> - <a href="https://jan.ai/api-reference">API Reference</a>
7+
<a href="https://cortex.so/docs/">Documentation</a> - <a href="https://cortex.so/api-reference">API Reference</a>
88
- <a href="https://github.com/janhq/cortex/releases">Changelog</a> - <a href="https://github.com/janhq/cortex/issues">Bug reports</a> - <a href="https://discord.gg/AsJ8krTT3N">Discord</a>
99
</p>
1010

1111
> ⚠️ **Cortex is currently in Development**: Expect breaking changes and bugs!
1212
1313
## About
14-
Cortex is an OpenAI-compatible AI engine that developers can use to build LLM apps. It is packaged with a Docker-inspired command-line interface and client libraries. It can be used as a standalone server or imported as a library.
14+
Cortex is a C++ AI engine that comes with a Docker-like command-line interface and client libraries. It supports running AI models using `ONNX`, `TensorRT-LLM`, and `llama.cpp` engines. Cortex can function as a standalone server or be integrated as a library.
1515

1616
## Cortex Engines
1717
Cortex supports the following engines:
1818
- [`cortex.llamacpp`](https://github.com/janhq/cortex.llamacpp): `cortex.llamacpp` library is a C++ inference tool that can be dynamically loaded by any server at runtime. We use this engine to support GGUF inference with GGUF models. The `llama.cpp` is optimized for performance on both CPU and GPU.
1919
- [`cortex.onnx` Repository](https://github.com/janhq/cortex.onnx): `cortex.onnx` is a C++ inference library for Windows that leverages `onnxruntime-genai` and uses DirectML to provide GPU acceleration across a wide range of hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
2020
- [`cortex.tensorrt-llm`](https://github.com/janhq/cortex.tensorrt-llm): `cortex.tensorrt-llm` is a C++ inference library designed for NVIDIA GPUs. It incorporates NVIDIA’s TensorRT-LLM for GPU-accelerated inference.
2121

22-
## Quicklinks
23-
24-
- [Homepage](https://cortex.so/)
25-
- [Docs](https://cortex.so/docs/)
26-
27-
## Quickstart
28-
### Prerequisites
29-
- **OS**:
30-
- MacOSX 13.6 or higher.
31-
- Windows 10 or higher.
32-
- Ubuntu 22.04 and later.
33-
- **Dependencies**:
34-
- **Node.js**: Version 18 and above is required to run the installation.
35-
- **NPM**: Needed to manage packages.
36-
- **CPU Instruction Sets**: Available for download from the [Cortex GitHub Releases](https://github.com/janhq/cortex/releases) page.
37-
- **OpenMPI**: Required for Linux. Install by using the following command:
38-
```bash
39-
sudo apt install openmpi-bin libopenmpi-dev
40-
```
41-
42-
> Visit [Quickstart](https://cortex.so/docs/quickstart) to get started.
43-
44-
### NPM
45-
``` bash
46-
# Install using NPM
47-
npm i -g cortexso
48-
# Run model
49-
cortex run mistral
50-
# To uninstall globally using NPM
51-
npm uninstall -g cortexso
22+
## Installation
23+
### MacOs
24+
```bash
25+
brew install cortex-engine
5226
```
53-
54-
### Homebrew
55-
``` bash
56-
# Install using Brew
57-
brew install cortexso
58-
# Run model
59-
cortex run mistral
60-
# To uninstall using Brew
61-
brew uninstall cortexso
27+
### Windows
28+
```bash
29+
winget install cortex-engine
6230
```
63-
> You can also install Cortex using the Cortex Installer available on [GitHub Releases](https://github.com/janhq/cortex/releases).
64-
65-
## Cortex Server
31+
### Linux
6632
```bash
67-
cortex serve
68-
69-
# Output
70-
# Started server at http://localhost:1337
71-
# Swagger UI available at http://localhost:1337/api
33+
sudo apt install cortex-engine
7234
```
35+
### Docker
36+
**Coming Soon!**
7337

74-
You can now access the Cortex API server at `http://localhost:1337`,
75-
and the Swagger UI at `http://localhost:1337/api`.
38+
### Libraries
39+
- [cortex.js](https://github.com/janhq/cortex.js)
40+
- [cortex.py](https://github.com/janhq/cortex-python)
7641

77-
## Build from Source
42+
### Build from Source
7843

7944
To install Cortex from the source, follow the steps below:
8045

@@ -98,42 +63,199 @@ chmod +x '[path-to]/cortex/cortex-js/dist/src/command.js'
9863
npm link
9964
```
10065

66+
67+
## Quickstart
68+
To run and chat with a model in Cortex:
69+
```bash
70+
# Start the Cortex server
71+
cortex
72+
73+
# Start a model
74+
cortex run [model_id]
75+
76+
# Chat with a model
77+
cortex chat [model_id]
78+
```
79+
## Model Library
80+
Cortex supports a list of models available on [Cortex Hub](https://huggingface.co/cortexso).
81+
82+
Here are example of models that you can use based on each supported engine:
83+
### `llama.cpp`
84+
| Model ID | Variant (Branch) | Model size | CLI command |
85+
|------------------|------------------|-------------------|------------------------------------|
86+
| codestral | 22b-gguf | 22B | `cortex run codestral:22b-gguf` |
87+
| command-r | 35b-gguf | 35B | `cortex run command-r:35b-gguf` |
88+
| gemma | 7b-gguf | 7B | `cortex run gemma:7b-gguf` |
89+
| llama3 | gguf | 8B | `cortex run llama3:gguf` |
90+
| llama3.1 | gguf | 8B | `cortex run llama3.1:gguf` |
91+
| mistral | 7b-gguf | 7B | `cortex run mistral:7b-gguf` |
92+
| mixtral | 7x8b-gguf | 46.7B | `cortex run mixtral:7x8b-gguf` |
93+
| openhermes-2.5 | 7b-gguf | 7B | `cortex run openhermes-2.5:7b-gguf`|
94+
| phi3 | medium-gguf | 14B - 4k ctx len | `cortex run phi3:medium-gguf` |
95+
| phi3 | mini-gguf | 3.82B - 4k ctx len| `cortex run phi3:mini-gguf` |
96+
| qwen2 | 7b-gguf | 7B | `cortex run qwen2:7b-gguf` |
97+
| tinyllama | 1b-gguf | 1.1B | `cortex run tinyllama:1b-gguf` |
98+
### `ONNX`
99+
| Model ID | Variant (Branch) | Model size | CLI command |
100+
|------------------|------------------|-------------------|------------------------------------|
101+
| gemma | 7b-onnx | 7B | `cortex run gemma:7b-onnx` |
102+
| llama3 | onnx | 8B | `cortex run llama3:onnx` |
103+
| mistral | 7b-onnx | 7B | `cortex run mistral:7b-onnx` |
104+
| openhermes-2.5 | 7b-onnx | 7B | `cortex run openhermes-2.5:7b-onnx`|
105+
| phi3 | mini-onnx | 3.82B - 4k ctx len| `cortex run phi3:mini-onnx` |
106+
| phi3 | medium-onnx | 14B - 4k ctx len | `cortex run phi3:medium-onnx` |
107+
### `TensorRT-LLM`
108+
| Model ID | Variant (Branch) | Model size | CLI command |
109+
|------------------|-------------------------------|-------------------|------------------------------------|
110+
| llama3 | 8b-tensorrt-llm-windows-ampere | 8B | `cortex run llama3:8b-tensorrt-llm-windows-ampere` |
111+
| llama3 | 8b-tensorrt-llm-linux-ampere | 8B | `cortex run llama3:8b-tensorrt-llm-linux-ampere` |
112+
| llama3 | 8b-tensorrt-llm-linux-ada | 8B | `cortex run llama3:8b-tensorrt-llm-linux-ada`|
113+
| llama3 | 8b-tensorrt-llm-windows-ada | 8B | `cortex run llama3:8b-tensorrt-llm-windows-ada` |
114+
| mistral | 7b-tensorrt-llm-linux-ampere | 7B | `cortex run mistral:7b-tensorrt-llm-linux-ampere`|
115+
| mistral | 7b-tensorrt-llm-windows-ampere | 7B | `cortex run mistral:7b-tensorrt-llm-windows-ampere` |
116+
| mistral | 7b-tensorrt-llm-linux-ada | 7B | `cortex run mistral:7b-tensorrt-llm-linux-ada`|
117+
| mistral | 7b-tensorrt-llm-windows-ada | 7B | `cortex run mistral:7b-tensorrt-llm-windows-ada` |
118+
| openhermes-2.5 | 7b-tensorrt-llm-windows-ampere | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-windows-ampere`|
119+
| openhermes-2.5 | 7b-tensorrt-llm-windows-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-windows-ada`|
120+
| openhermes-2.5 | 7b-tensorrt-llm-linux-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada`|
121+
122+
> **Note**:
123+
> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models.
124+
101125
## Cortex CLI Commands
126+
> **Note**:
127+
> For a more detailed CLI Reference documentation, please see [here](https://cortex.so/docs/cli).
128+
### Start Cortex Server
129+
```bash
130+
cortex
131+
```
132+
### Chat with a Model
133+
```bash
134+
cortex chat [options] [model_id] [message]
135+
```
136+
### Embeddings
137+
```bash
138+
cortex embeddings [options] [model_id] [message]
139+
```
140+
### Pull a Model
141+
```bash
142+
cortex pull <model_id>
143+
```
144+
> This command can also pulls Hugging Face's models.
145+
### Download and Start a Model
146+
```bash
147+
cortex run [options] [model_id]:[engine]
148+
```
149+
### Get a Model Details
150+
```bash
151+
cortex models get <model_id>
152+
```
153+
### List Models
154+
```bash
155+
cortex models list [options]
156+
```
157+
### Remove a Model
158+
```bash
159+
cortex models remove <model_id>
160+
```
161+
### Start a Model
162+
```bash
163+
cortex models start [model_id]
164+
```
165+
### Stop a Model
166+
```bash
167+
cortex models stop <model_id>
168+
```
169+
### Update a Model Config
170+
```bash
171+
cortex models update [options] <model_id>
172+
```
173+
### Get an Engine Details
174+
```bash
175+
cortex engines get <engine_name>
176+
```
177+
### Install an Engine
178+
```bash
179+
cortex engines install <engine_name> [options]
180+
```
181+
### List Engines
182+
```bash
183+
cortex engines list [options]
184+
```
185+
### Set an Engine Config
186+
```bash
187+
cortex engines set <engine_name> <config> <value>
188+
```
189+
### Show Model Information
190+
```bash
191+
cortex ps
192+
```
193+
## REST API
194+
Cortex has a REST API that runs at `localhost:1337`.
195+
196+
### Pull a Model
197+
```bash
198+
curl --request POST \
199+
--url http://localhost:1337/v1/models/{model_id}/pull
200+
```
102201

103-
The following CLI commands are currently available.
104-
See [CLI Reference Docs](https://cortex.so/docs/cli) for more information.
105-
106-
```bash
107-
108-
serve Providing API endpoint for Cortex backend.
109-
chat Send a chat request to a model.
110-
init|setup Init settings and download cortex's dependencies.
111-
ps Show running models and their status.
112-
kill Kill running cortex processes.
113-
pull|download Download a model. Working with HuggingFace model id.
114-
run [options] EXPERIMENTAL: Shortcut to start a model and chat.
115-
models Subcommands for managing models.
116-
models list List all available models.
117-
models pull Download a specified model.
118-
models remove Delete a specified model.
119-
models get Retrieve the configuration of a specified model.
120-
models start Start a specified model.
121-
models stop Stop a specified model.
122-
models update Update the configuration of a specified model.
123-
benchmark Benchmark and analyze the performance of a specific AI model using your system.
124-
presets Show all the available model presets within Cortex.
125-
telemetry Retrieve telemetry logs for monitoring and analysis.
126-
embeddings Creates an embedding vector representing the input text.
127-
engines Subcommands for managing engines.
128-
engines get Get an engine details.
129-
engines list Get all the available Cortex engines.
130-
engines init Setup and download the required dependencies to run cortex engines.
131-
configs Subcommands for managing configurations.
132-
configs get Get a configuration details.
133-
configs list Get all the available configurations.
134-
configs set Set a configuration.
202+
### Start a Model
203+
```bash
204+
curl --request POST \
205+
--url http://localhost:1337/v1/models/{model_id}/start \
206+
--header 'Content-Type: application/json' \
207+
--data '{
208+
"prompt_template": "system\n{system_message}\nuser\n{prompt}\nassistant",
209+
"stop": [],
210+
"ngl": 4096,
211+
"ctx_len": 4096,
212+
"cpu_threads": 10,
213+
"n_batch": 2048,
214+
"caching_enabled": true,
215+
"grp_attn_n": 1,
216+
"grp_attn_w": 512,
217+
"mlock": false,
218+
"flash_attn": true,
219+
"cache_type": "f16",
220+
"use_mmap": true,
221+
"engine": "cortex.llamacpp"
222+
}'
135223
```
136224

225+
### Chat with a Model
226+
```bash
227+
curl http://localhost:1337/v1/chat/completions \
228+
-H "Content-Type: application/json" \
229+
-d '{
230+
"model": "",
231+
"messages": [
232+
{
233+
"role": "user",
234+
"content": "Hello"
235+
},
236+
],
237+
"model": "mistral",
238+
"stream": true,
239+
"max_tokens": 1,
240+
"stop": [
241+
null
242+
],
243+
"frequency_penalty": 1,
244+
"presence_penalty": 1,
245+
"temperature": 1,
246+
"top_p": 1
247+
}'
248+
```
249+
250+
### Stop a Model
251+
```bash
252+
curl --request POST \
253+
--url http://localhost:1337/v1/models/mistral/stop
254+
```
255+
256+
257+
> **Note**: Check our [API documentation](https://cortex.so/api-reference) for a full list of available endpoints.
258+
137259
## Contact Support
138260
- For support, please file a [GitHub ticket](https://github.com/janhq/cortex/issues/new/choose).
139261
- For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH).

assets/cortex-banner.png

591 KB
Loading

0 commit comments

Comments
 (0)