diff --git a/README.md b/README.md index 660664159..91a66c318 100644 --- a/README.md +++ b/README.md @@ -4,14 +4,14 @@

- Documentation - API Reference + Documentation - API Reference - Changelog - Bug reports - Discord

> ⚠️ **Cortex is currently in Development**: Expect breaking changes and bugs! ## About -Cortex is an OpenAI-compatible AI engine that developers can use to build LLM apps. It is packaged with a Docker-inspired command-line interface and client libraries. It can be used as a standalone server or imported as a library. +Cortex is a C++ AI engine that comes with a Docker-like command-line interface and client libraries. It supports running AI models using `ONNX`, `TensorRT-LLM`, and `llama.cpp` engines. Cortex can function as a standalone server or be integrated as a library. ## Cortex Engines Cortex supports the following engines: @@ -19,62 +19,27 @@ Cortex supports the following engines: - [`cortex.onnx` Repository](https://github.com/janhq/cortex.onnx): `cortex.onnx` is a C++ inference library for Windows that leverages `onnxruntime-genai` and uses DirectML to provide GPU acceleration across a wide range of hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs. - [`cortex.tensorrt-llm`](https://github.com/janhq/cortex.tensorrt-llm): `cortex.tensorrt-llm` is a C++ inference library designed for NVIDIA GPUs. It incorporates NVIDIA’s TensorRT-LLM for GPU-accelerated inference. -## Quicklinks - -- [Homepage](https://cortex.so/) -- [Docs](https://cortex.so/docs/) - -## Quickstart -### Prerequisites -- **OS**: - - MacOSX 13.6 or higher. - - Windows 10 or higher. - - Ubuntu 22.04 and later. -- **Dependencies**: - - **Node.js**: Version 18 and above is required to run the installation. - - **NPM**: Needed to manage packages. - - **CPU Instruction Sets**: Available for download from the [Cortex GitHub Releases](https://github.com/janhq/cortex/releases) page. - - **OpenMPI**: Required for Linux. Install by using the following command: - ```bash - sudo apt install openmpi-bin libopenmpi-dev - ``` - -> Visit [Quickstart](https://cortex.so/docs/quickstart) to get started. - -### NPM -``` bash -# Install using NPM -npm i -g cortexso -# Run model -cortex run mistral -# To uninstall globally using NPM -npm uninstall -g cortexso +## Installation +### MacOs +```bash +brew install cortex-engine ``` - -### Homebrew -``` bash -# Install using Brew -brew install cortexso -# Run model -cortex run mistral -# To uninstall using Brew -brew uninstall cortexso +### Windows +```bash +winget install cortex-engine ``` -> You can also install Cortex using the Cortex Installer available on [GitHub Releases](https://github.com/janhq/cortex/releases). - -## Cortex Server +### Linux ```bash -cortex serve - -# Output -# Started server at http://localhost:1337 -# Swagger UI available at http://localhost:1337/api +sudo apt install cortex-engine ``` +### Docker +**Coming Soon!** -You can now access the Cortex API server at `http://localhost:1337`, -and the Swagger UI at `http://localhost:1337/api`. +### Libraries +- [cortex.js](https://github.com/janhq/cortex.js) +- [cortex.py](https://github.com/janhq/cortex-python) -## Build from Source +### Build from Source To install Cortex from the source, follow the steps below: @@ -98,42 +63,199 @@ chmod +x '[path-to]/cortex/cortex-js/dist/src/command.js' npm link ``` + +## Quickstart +To run and chat with a model in Cortex: +```bash +# Start the Cortex server +cortex + +# Start a model +cortex run [model_id] + +# Chat with a model +cortex chat [model_id] +``` +## Model Library +Cortex supports a list of models available on [Cortex Hub](https://huggingface.co/cortexso). + +Here are example of models that you can use based on each supported engine: +### `llama.cpp` +| Model ID | Variant (Branch) | Model size | CLI command | +|------------------|------------------|-------------------|------------------------------------| +| codestral | 22b-gguf | 22B | `cortex run codestral:22b-gguf` | +| command-r | 35b-gguf | 35B | `cortex run command-r:35b-gguf` | +| gemma | 7b-gguf | 7B | `cortex run gemma:7b-gguf` | +| llama3 | gguf | 8B | `cortex run llama3:gguf` | +| llama3.1 | gguf | 8B | `cortex run llama3.1:gguf` | +| mistral | 7b-gguf | 7B | `cortex run mistral:7b-gguf` | +| mixtral | 7x8b-gguf | 46.7B | `cortex run mixtral:7x8b-gguf` | +| openhermes-2.5 | 7b-gguf | 7B | `cortex run openhermes-2.5:7b-gguf`| +| phi3 | medium-gguf | 14B - 4k ctx len | `cortex run phi3:medium-gguf` | +| phi3 | mini-gguf | 3.82B - 4k ctx len| `cortex run phi3:mini-gguf` | +| qwen2 | 7b-gguf | 7B | `cortex run qwen2:7b-gguf` | +| tinyllama | 1b-gguf | 1.1B | `cortex run tinyllama:1b-gguf` | +### `ONNX` +| Model ID | Variant (Branch) | Model size | CLI command | +|------------------|------------------|-------------------|------------------------------------| +| gemma | 7b-onnx | 7B | `cortex run gemma:7b-onnx` | +| llama3 | onnx | 8B | `cortex run llama3:onnx` | +| mistral | 7b-onnx | 7B | `cortex run mistral:7b-onnx` | +| openhermes-2.5 | 7b-onnx | 7B | `cortex run openhermes-2.5:7b-onnx`| +| phi3 | mini-onnx | 3.82B - 4k ctx len| `cortex run phi3:mini-onnx` | +| phi3 | medium-onnx | 14B - 4k ctx len | `cortex run phi3:medium-onnx` | +### `TensorRT-LLM` +| Model ID | Variant (Branch) | Model size | CLI command | +|------------------|-------------------------------|-------------------|------------------------------------| +| llama3 | 8b-tensorrt-llm-windows-ampere | 8B | `cortex run llama3:8b-tensorrt-llm-windows-ampere` | +| llama3 | 8b-tensorrt-llm-linux-ampere | 8B | `cortex run llama3:8b-tensorrt-llm-linux-ampere` | +| llama3 | 8b-tensorrt-llm-linux-ada | 8B | `cortex run llama3:8b-tensorrt-llm-linux-ada`| +| llama3 | 8b-tensorrt-llm-windows-ada | 8B | `cortex run llama3:8b-tensorrt-llm-windows-ada` | +| mistral | 7b-tensorrt-llm-linux-ampere | 7B | `cortex run mistral:7b-tensorrt-llm-linux-ampere`| +| mistral | 7b-tensorrt-llm-windows-ampere | 7B | `cortex run mistral:7b-tensorrt-llm-windows-ampere` | +| mistral | 7b-tensorrt-llm-linux-ada | 7B | `cortex run mistral:7b-tensorrt-llm-linux-ada`| +| mistral | 7b-tensorrt-llm-windows-ada | 7B | `cortex run mistral:7b-tensorrt-llm-windows-ada` | +| openhermes-2.5 | 7b-tensorrt-llm-windows-ampere | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-windows-ampere`| +| openhermes-2.5 | 7b-tensorrt-llm-windows-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-windows-ada`| +| openhermes-2.5 | 7b-tensorrt-llm-linux-ada | 7B | `cortex run openhermes-2.5:7b-tensorrt-llm-linux-ada`| + +> **Note**: +> You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 14B models, and 32 GB to run the 32B models. + ## Cortex CLI Commands +> **Note**: +> For a more detailed CLI Reference documentation, please see [here](https://cortex.so/docs/cli). +### Start Cortex Server +```bash +cortex +``` +### Chat with a Model +```bash +cortex chat [options] [model_id] [message] +``` +### Embeddings +```bash +cortex embeddings [options] [model_id] [message] +``` +### Pull a Model +```bash +cortex pull +``` +> This command can also pulls Hugging Face's models. +### Download and Start a Model +```bash +cortex run [options] [model_id]:[engine] +``` +### Get a Model Details +```bash +cortex models get +``` +### List Models +```bash +cortex models list [options] +``` +### Remove a Model +```bash +cortex models remove +``` +### Start a Model +```bash +cortex models start [model_id] +``` +### Stop a Model +```bash +cortex models stop +``` +### Update a Model Config +```bash +cortex models update [options] +``` +### Get an Engine Details +```bash +cortex engines get +``` +### Install an Engine +```bash +cortex engines install [options] +``` +### List Engines +```bash +cortex engines list [options] +``` +### Set an Engine Config +```bash +cortex engines set +``` +### Show Model Information +```bash +cortex ps +``` +## REST API +Cortex has a REST API that runs at `localhost:1337`. + +### Pull a Model +```bash +curl --request POST \ + --url http://localhost:1337/v1/models/{model_id}/pull +``` -The following CLI commands are currently available. -See [CLI Reference Docs](https://cortex.so/docs/cli) for more information. - -```bash - - serve Providing API endpoint for Cortex backend. - chat Send a chat request to a model. - init|setup Init settings and download cortex's dependencies. - ps Show running models and their status. - kill Kill running cortex processes. - pull|download Download a model. Working with HuggingFace model id. - run [options] EXPERIMENTAL: Shortcut to start a model and chat. - models Subcommands for managing models. - models list List all available models. - models pull Download a specified model. - models remove Delete a specified model. - models get Retrieve the configuration of a specified model. - models start Start a specified model. - models stop Stop a specified model. - models update Update the configuration of a specified model. - benchmark Benchmark and analyze the performance of a specific AI model using your system. - presets Show all the available model presets within Cortex. - telemetry Retrieve telemetry logs for monitoring and analysis. - embeddings Creates an embedding vector representing the input text. - engines Subcommands for managing engines. - engines get Get an engine details. - engines list Get all the available Cortex engines. - engines init Setup and download the required dependencies to run cortex engines. - configs Subcommands for managing configurations. - configs get Get a configuration details. - configs list Get all the available configurations. - configs set Set a configuration. +### Start a Model +```bash +curl --request POST \ + --url http://localhost:1337/v1/models/{model_id}/start \ + --header 'Content-Type: application/json' \ + --data '{ + "prompt_template": "system\n{system_message}\nuser\n{prompt}\nassistant", + "stop": [], + "ngl": 4096, + "ctx_len": 4096, + "cpu_threads": 10, + "n_batch": 2048, + "caching_enabled": true, + "grp_attn_n": 1, + "grp_attn_w": 512, + "mlock": false, + "flash_attn": true, + "cache_type": "f16", + "use_mmap": true, + "engine": "cortex.llamacpp" +}' ``` +### Chat with a Model +```bash +curl http://localhost:1337/v1/chat/completions \ +-H "Content-Type: application/json" \ +-d '{ + "model": "", + "messages": [ + { + "role": "user", + "content": "Hello" + }, + ], + "model": "mistral", + "stream": true, + "max_tokens": 1, + "stop": [ + null + ], + "frequency_penalty": 1, + "presence_penalty": 1, + "temperature": 1, + "top_p": 1 +}' +``` + +### Stop a Model +```bash +curl --request POST \ + --url http://localhost:1337/v1/models/mistral/stop +``` + + +> **Note**: Check our [API documentation](https://cortex.so/api-reference) for a full list of available endpoints. + ## Contact Support - For support, please file a [GitHub ticket](https://github.com/janhq/cortex/issues/new/choose). - For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH). diff --git a/assets/cortex-banner.png b/assets/cortex-banner.png index c8ce05c00..95a7262a9 100644 Binary files a/assets/cortex-banner.png and b/assets/cortex-banner.png differ