Merge pull request #6 from sammcj/docker

codelion · web-flow · commit e5c7cd0eb2b8 · 2024-09-15T09:36:42.000+08:00
feat(docker,auth): Add Docker, Compose, Auth, Parameter envs
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,47 @@
+# Build stage
+FROM python:3.12-slim AS builder
+
+# Set working directory
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+  gcc libc6-dev \
+  && rm -rf /var/lib/apt/lists/*
+
+# Copy only the requirements file first to leverage Docker cache
+COPY requirements.txt .
+
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Final stage
+FROM python:3.12-slim
+
+# Install curl for the healthcheck
+RUN apt-get update && apt-get install -y --no-install-recommends \
+  curl && \
+  apt-get clean && rm -rf /var/lib/apt/lists/*
+
+# Set working directory
+WORKDIR /app
+
+# Copy installed dependencies from builder stage
+COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
+COPY --from=builder /usr/local/bin /usr/local/bin
+
+# Copy application code
+COPY . .
+
+# Create a non-root user and switch to it
+RUN useradd -m appuser
+USER appuser
+
+# Set environment variables
+ENV PYTHONUNBUFFERED=1
+
+# Expose the port the app runs on
+EXPOSE 8000
+
+# Run the application
+ENTRYPOINT ["python", "optillm.py"]
diff --git a/README.md b/README.md
@@ -6,15 +6,16 @@ optillm is an OpenAI API compatible optimizing inference proxy which implements
 
 ### plansearch-gpt-4o-mini on LiveCodeBench (Sep 2024)
 
-| Model | pass@1 | pass@5 | pass@10 |
-|-------|--------|--------|---------|
-| plansearch-gpt-4o-mini | 44.03 | 59.31 | 63.5 |
-| gpt-4o-mini | 43.9 | 50.61 | 53.25 |
-| claude-3.5-sonnet | 51.3 | | |
-| gpt-4o-2024-05-13 | 45.2 | | |
-| gpt-4-turbo-2024-04-09 | 44.2 | | |
+| Model                  | pass@1 | pass@5 | pass@10 |
+| ---------------------- | ------ | ------ | ------- |
+| plansearch-gpt-4o-mini | 44.03  | 59.31  | 63.5    |
+| gpt-4o-mini            | 43.9   | 50.61  | 53.25   |
+| claude-3.5-sonnet      | 51.3   |        |         |
+| gpt-4o-2024-05-13      | 45.2   |        |         |
+| gpt-4-turbo-2024-04-09 | 44.2   |        |         |
 
 ### moa-gpt-4o-mini on Arena-Hard-Auto (Aug 2024)
+
 ![Results showing Mixture of Agents approach using gpt-4o-mini on Arena Hard Auto Benchmark](./moa-results.png)
 
 ## Installation
@@ -32,7 +33,7 @@ pip install -r requirements.txt
 You can then run the optillm proxy as follows.
 
 ```bash
-python optillm.py                           
+python optillm.py
 2024-09-06 07:57:14,191 - INFO - Starting server with approach: auto
 2024-09-06 07:57:14,191 - INFO - Server configuration: {'approach': 'auto', 'mcts_simulations': 2, 'mcts_exploration': 0.2, 'mcts_depth': 1, 'best_of_n': 3, 'model': 'gpt-4o-mini', 'rstar_max_depth': 3, 'rstar_num_rollouts': 5, 'rstar_c': 1.4, 'base_url': ''}
  * Serving Flask app 'optillm'
@@ -44,11 +45,11 @@ python optillm.py
 2024-09-06 07:57:14,212 - INFO - Press CTRL+C to quit
 ```
 
-### Usage
+## Usage
 
-Once the proxy is running, you can just use it as a drop in replacement for an OpenAI client by setting the `base_url` as `http://localhost:8000/v1`.
+Once the proxy is running, you can use it as a drop in replacement for an OpenAI client by setting the `base_url` as `http://localhost:8000/v1`.
 
-```bash
+```python
 import os
 from openai import OpenAI
 
@@ -70,7 +71,7 @@ response = client.chat.completions.create(
 print(response)
 ```
 
-You can control the technique you use for optimization by prepending the slug to the model name `{slug}-model-name`. E.g. in the above code we are using `moa` or 
+You can control the technique you use for optimization by prepending the slug to the model name `{slug}-model-name`. E.g. in the above code we are using `moa` or
 mixture of agents as the optimization approach. In the proxy logs you will see the following showing the `moa` is been used with the base model as `gpt-4o-mini`.
 
 ```bash
@@ -83,20 +84,86 @@ mixture of agents as the optimization approach. In the proxy logs you will see t
 
 ## Implemented techniques
 
-| Technique | Slug | Description |
-|-----------|----------------|-------------|
-| Agent | `agent ` | Determines which of the below approaches to take and then combines the results |
-| Monte Carlo Tree Search | `mcts` | Uses MCTS for decision-making in chat responses |
-| Best of N Sampling | `bon` | Generates multiple responses and selects the best one |
-| Mixture of Agents | `moa` | Combines responses from multiple critiques |
-| Round Trip Optimization | `rto` | Optimizes responses through a round-trip process |
-| Z3 Solver | `z3` | Utilizes the Z3 theorem prover for logical reasoning |
-| Self-Consistency | `self_consistency` | Implements an advanced self-consistency method |
-| PV Game | `pvg` | Applies a prover-verifier game approach at inference time |
-| R* Algorithm | `rstar` | Implements the R* algorithm for problem-solving |
-| CoT with Reflection | `cot_reflection` | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output\> sections | 
-| PlanSearch | `plansearch` | Implements a search algorithm over candidate plans for solving a problem in natural language |
-| LEAP | `leap` | Learns task-specific principles from few shot examples |
+| Technique               | Slug               | Description                                                                                    |
+| ----------------------- | ------------------ | ---------------------------------------------------------------------------------------------- |
+| Agent                   | `agent`            | Determines which of the below approaches to take and then combines the results                 |
+| Monte Carlo Tree Search | `mcts`             | Uses MCTS for decision-making in chat responses                                                |
+| Best of N Sampling      | `bon`              | Generates multiple responses and selects the best one                                          |
+| Mixture of Agents       | `moa`              | Combines responses from multiple critiques                                                     |
+| Round Trip Optimization | `rto`              | Optimizes responses through a round-trip process                                               |
+| Z3 Solver               | `z3`               | Utilizes the Z3 theorem prover for logical reasoning                                           |
+| Self-Consistency        | `self_consistency` | Implements an advanced self-consistency method                                                 |
+| PV Game                 | `pvg`              | Applies a prover-verifier game approach at inference time                                      |
+| R* Algorithm            | `rstar`            | Implements the R* algorithm for problem-solving                                                |
+| CoT with Reflection     | `cot_reflection`   | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output\> sections |
+| PlanSearch              | `plansearch`       | Implements a search algorithm over candidate plans for solving a problem in natural language   |
+| LEAP                    | `leap`             | Learns task-specific principles from few shot examples                                         |
+
+## Available Parameters
+
+optillm supports various command-line arguments and environment variables for configuration.
+
+| Parameter                | Description                                                     | Default Value   |
+|--------------------------|-----------------------------------------------------------------|-----------------|
+| `--approach`             | Inference approach to use                                       | `"auto"`        |
+| `--simulations`          | Number of MCTS simulations                                      | 2               |
+| `--exploration`          | Exploration weight for MCTS                                     | 0.2             |
+| `--depth`                | Simulation depth for MCTS                                       | 1               |
+| `--best-of-n`            | Number of samples for best_of_n approach                        | 3               |
+| `--model`                | OpenAI model to use                                             | `"gpt-4o-mini"` |
+| `--base-url`             | Base URL for OpenAI compatible endpoint                         | `""`            |
+| `--rstar-max-depth`      | Maximum depth for rStar algorithm                               | 3               |
+| `--rstar-num-rollouts`   | Number of rollouts for rStar algorithm                          | 5               |
+| `--rstar-c`              | Exploration constant for rStar algorithm                        | 1.4             |
+| `--n`                    | Number of final responses to be returned                        | 1               |
+| `--return-full-response` | Return the full response including the CoT with <thinking> tags | `False`         |
+| `--port`                 | Specify the port to run the proxy                               | 8000            |
+| `--api-key`              | Optional API key for client authentication to optillm           | `""`            |
+
+When using Docker, these can be set as environment variables prefixed with `OPTILLM_`.
+
+## Running with Docker
+
+optillm can optionally be built and run using Docker and the provided [Dockerfile](./Dockerfile).
+
+### Using Docker Compose
+
+1. Make sure you have Docker and Docker Compose installed on your system.
+
+2. Either update the environment variables in the docker-compose.yaml file or create a `.env` file in the project root directory and add any environment variables you want to set. For example, to set the OpenAI API key, add the following line to the `.env` file:
+
+   ```bash
+   OPENAI_API_KEY=your_openai_api_key_here
+   ```
+
+3. Run the following command to start optillm:
+
+   ```bash
+   docker compose up -d
+   ```
+
+   This will build the Docker image if it doesn't exist and start the optillm service.
+
+4. optillm will be available at `http://localhost:8000`.
+
+When using Docker, you can set these parameters as environment variables. For example, to set the approach and model, you would use:
+
+```bash
+OPTILLM_APPROACH=mcts
+OPTILLM_MODEL=gpt-4
+```
+
+To secure the optillm proxy with an API key, set the `OPTILLM_API_KEY` environment variable:
+
+```bash
+OPTILLM_API_KEY=your_secret_api_key
+```
+
+When the API key is set, clients must include it in their requests using the `Authorization` header:
+
+```plain
+Authorization: Bearer your_secret_api_key
+```
 
 ## References
 
diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -0,0 +1,38 @@
+services:
+  &name optillm:
+    build:
+      context: https://github.com/codelion/optillm.git#main
+      # context: .
+      dockerfile: Dockerfile
+      tags:
+        - optillm:latest
+    image: optillm:latest
+    container_name: *name
+    hostname: *name
+    ports:
+      - "8000:8000"
+    environment:
+      OPENAI_API_KEY: ${OPENAI_API_KEY:-""}
+      OPTILLM_BASE_URL: ${OPENAI_BASE_URL:-"https://api.openai.com/v1"}
+      # OPTILLM_API_KEY: ${OPTILLM_API_KEY:-} # optionally sets an API key for Optillm clients
+      # Uncomment and set values for other arguments (prefixed with OPTILLM_) as needed, e.g.:
+      # OPTILLM_APPROACH: auto
+      # OPTILLM_MODEL: gpt-4o-mini
+      # OPTILLM_SIMULATIONS: 2
+      # OPTILLM_EXPLORATION: 0.2
+      # OPTILLM_DEPTH: 1
+      # OPTILLM_BEST_OF_N: 3
+      # OPTILLM_RSTAR_MAX_DEPTH: 3
+      # OPTILLM_RSTAR_NUM_ROLLOUTS: 5
+      # OPTILLM_RSTAR_C: 1.4
+      # OPTILLM_N: 1
+      # OPTILLM_RETURN_FULL_RESPONSE: false
+      # OPTILLM_PORT: 8000
+    restart: on-failure
+    stop_grace_period: 2s
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://127.0.0.1:8000/health"]
+      interval: 30s
+      timeout: 5s
+      retries: 3
+      start_period: 3s
diff --git a/optillm.py b/optillm.py