diff --git a/README.md b/README.md index 660664159..91a66c318 100644 --- a/README.md +++ b/README.md @@ -4,14 +4,14 @@
- Documentation - API Reference + Documentation - API Reference - Changelog - Bug reports - Discord
> ⚠️ **Cortex is currently in Development**: Expect breaking changes and bugs! ## About -Cortex is an OpenAI-compatible AI engine that developers can use to build LLM apps. It is packaged with a Docker-inspired command-line interface and client libraries. It can be used as a standalone server or imported as a library. +Cortex is a C++ AI engine that comes with a Docker-like command-line interface and client libraries. It supports running AI models using `ONNX`, `TensorRT-LLM`, and `llama.cpp` engines. Cortex can function as a standalone server or be integrated as a library. ## Cortex Engines Cortex supports the following engines: @@ -19,62 +19,27 @@ Cortex supports the following engines: - [`cortex.onnx` Repository](https://github.com/janhq/cortex.onnx): `cortex.onnx` is a C++ inference library for Windows that leverages `onnxruntime-genai` and uses DirectML to provide GPU acceleration across a wide range of hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs. - [`cortex.tensorrt-llm`](https://github.com/janhq/cortex.tensorrt-llm): `cortex.tensorrt-llm` is a C++ inference library designed for NVIDIA GPUs. It incorporates NVIDIA’s TensorRT-LLM for GPU-accelerated inference. -## Quicklinks - -- [Homepage](https://cortex.so/) -- [Docs](https://cortex.so/docs/) - -## Quickstart -### Prerequisites -- **OS**: - - MacOSX 13.6 or higher. - - Windows 10 or higher. - - Ubuntu 22.04 and later. -- **Dependencies**: - - **Node.js**: Version 18 and above is required to run the installation. - - **NPM**: Needed to manage packages. - - **CPU Instruction Sets**: Available for download from the [Cortex GitHub Releases](https://github.com/janhq/cortex/releases) page. - - **OpenMPI**: Required for Linux. Install by using the following command: - ```bash - sudo apt install openmpi-bin libopenmpi-dev - ``` - -> Visit [Quickstart](https://cortex.so/docs/quickstart) to get started. - -### NPM -``` bash -# Install using NPM -npm i -g cortexso -# Run model -cortex run mistral -# To uninstall globally using NPM -npm uninstall -g cortexso +## Installation +### MacOs +```bash +brew install cortex-engine ``` - -### Homebrew -``` bash -# Install using Brew -brew install cortexso -# Run model -cortex run mistral -# To uninstall using Brew -brew uninstall cortexso +### Windows +```bash +winget install cortex-engine ``` -> You can also install Cortex using the Cortex Installer available on [GitHub Releases](https://github.com/janhq/cortex/releases). - -## Cortex Server +### Linux ```bash -cortex serve - -# Output -# Started server at http://localhost:1337 -# Swagger UI available at http://localhost:1337/api +sudo apt install cortex-engine ``` +### Docker +**Coming Soon!** -You can now access the Cortex API server at `http://localhost:1337`, -and the Swagger UI at `http://localhost:1337/api`. +### Libraries +- [cortex.js](https://github.com/janhq/cortex.js) +- [cortex.py](https://github.com/janhq/cortex-python) -## Build from Source +### Build from Source To install Cortex from the source, follow the steps below: @@ -98,42 +63,199 @@ chmod +x '[path-to]/cortex/cortex-js/dist/src/command.js' npm link ``` + +## Quickstart +To run and chat with a model in Cortex: +```bash +# Start the Cortex server +cortex + +# Start a model +cortex run [model_id] + +# Chat with a model +cortex chat [model_id] +``` +## Model Library +Cortex supports a list of models available on [Cortex Hub](https://huggingface.co/cortexso). + +Here are example of models that you can use based on each supported engine: +### `llama.cpp` +| Model ID | Variant (Branch) | Model size | CLI command | +|------------------|------------------|-------------------|------------------------------------| +| codestral | 22b-gguf | 22B | `cortex run codestral:22b-gguf` | +| command-r | 35b-gguf | 35B | `cortex run command-r:35b-gguf` | +| gemma | 7b-gguf | 7B | `cortex run gemma:7b-gguf` | +| llama3 | gguf | 8B | `cortex run llama3:gguf` | +| llama3.1 | gguf | 8B | `cortex run llama3.1:gguf` | +| mistral | 7b-gguf | 7B | `cortex run mistral:7b-gguf` | +| mixtral | 7x8b-gguf | 46.7B | `cortex run mixtral:7x8b-gguf` | +| openhermes-2.5 | 7b-gguf | 7B | `cortex run openhermes-2.5:7b-gguf`| +| phi3 | medium-gguf | 14B - 4k ctx len | `cortex run phi3:medium-gguf` | +| phi3 | mini-gguf | 3.82B - 4k ctx len| `cortex run phi3:mini-gguf` | +| qwen2 | 7b-gguf | 7B | `cortex run qwen2:7b-gguf` | +| tinyllama | 1b-gguf | 1.1B | `cortex run tinyllama:1b-gguf` | +### `ONNX` +| Model ID | Variant (Branch) | Model size | CLI command | +|------------------|------------------|-------------------|------------------------------------| +| gemma | 7b-onnx | 7B | `cortex run gemma:7b-onnx` | +| llama3 | onnx | 8B | `cortex run llama3:onnx` | +| mistral | 7b-onnx | 7B | `cortex run mistral:7b-onnx` | +| openhermes-2.5 | 7b-onnx | 7B | `cortex run openhermes-2.5:7b-onnx`| +| phi3 | mini-onnx | 3.82B - 4k ctx len| `cortex run phi3:mini-onnx` | +| phi3 | medium-onnx | 14B - 4k ctx len | `cortex run phi3:medium-onnx` | +### `TensorRT-LLM` +| Model ID | Variant (Branch) | Model size | CLI command | +|------------------|-------------------------------|-------------------|------------------------------------| +| llama3 | 8b-tensorrt-llm-windows-ampere | 8B | `cortex run llama3:8b-tensorrt-llm-windows-ampere` | +| llama3 | 8b-tensorrt-llm-linux-ampere | 8B | `cortex run llama3:8b-tensorrt-llm-linux-ampere` | +| llama3 | 8b-tensorrt-llm-linux-ada | 8B | `cortex run llama3:8b-tensorrt-llm-linux-ada`| +| llama3 | 8b-tensorrt-llm-windows-ada | 8B | `cortex run llama3:8b-tensorrt-llm-windows-ada` | +| mistral | 7b-tensorrt-llm-linux-ampere | 7B | `cortex run mistral:7b-tensorrt-llm-linux-ampere`| +| mistral | 7b-tensorrt-llm-windows-ampere | 7B | `cortex run mistral:7b-tensorrt-llm-windows-ampere` | +| mistral | 7b-tensorrt-llm-linux-ada | 7B | `cortex run mistral:7b-tensorrt-llm-linux-ada`| +| mistral | 7b-tensorrt-llm-windows-ada | 7B | `cortex run mistral:7b-tensorrt-llm-windows-ada` | +| openhermes-2.5 | 7b-tensorrt-llm-windows-ampere | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-windows-ampere`| +| openhermes-2.5 | 7b-tensorrt-llm-windows-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-windows-ada`| +| openhermes-2.5 | 7b-tensorrt-llm-linux-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada`| + +> **Note**: +> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models. + ## Cortex CLI Commands +> **Note**: +> For a more detailed CLI Reference documentation, please see [here](https://cortex.so/docs/cli). +### Start Cortex Server +```bash +cortex +``` +### Chat with a Model +```bash +cortex chat [options] [model_id] [message] +``` +### Embeddings +```bash +cortex embeddings [options] [model_id] [message] +``` +### Pull a Model +```bash +cortex pull