Skip to content

Commit 74e0ae1

Browse files
authored
Merge pull request #250 from CerebrasResearch/cepo_2025Q3
Cepo 2025 Q3
2 parents 770bf09 + ba45c60 commit 74e0ae1

File tree

7 files changed

+670
-157
lines changed

7 files changed

+670
-157
lines changed

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -490,12 +490,16 @@ optillm supports various command-line arguments for configuration. When using Do
490490
| `--cepo_planning_m` | Number of attempts to generate n plans in planning stage | 6 |
491491
| `--cepo_planning_temperature_step1` | Temperature for generator in step 1 of planning stage | 0.55 |
492492
| `--cepo_planning_temperature_step2` | Temperature for generator in step 2 of planning stage | 0.25 |
493+
| `--cepo_planning_temperature_direct_resp` | Temperature for generator after step 2 if planning fails and answer directly | 0.1 |
493494
| `--cepo_planning_temperature_step3` | Temperature for generator in step 3 of planning stage | 0.1 |
494495
| `--cepo_planning_temperature_step4` | Temperature for generator in step 4 of planning stage | 0 |
495496
| `--cepo_planning_max_tokens_step1` | Maximum number of tokens in step 1 of planning stage | 4096 |
496497
| `--cepo_planning_max_tokens_step2` | Maximum number of tokens in step 2 of planning stage | 4096 |
498+
| `--cepo_planning_max_tokens_direct_resp` | Maximum number of tokens after step 2 if planning fails and answer directly | 4096 |
497499
| `--cepo_planning_max_tokens_step3` | Maximum number of tokens in step 3 of planning stage | 4096 |
498500
| `--cepo_planning_max_tokens_step4` | Maximum number of tokens in step 4 of planning stage | 4096 |
501+
| `--cepo_use_reasoning_fallback` | Whether to fallback to lower levels of reasoning when higher level fails | False |
502+
| `--cepo_num_of_retries` | Number of retries if llm call fails, 0 for no retries | 0 |
499503
| `--cepo_print_output` | Whether to print the output of each stage | `False` |
500504
| `--cepo_config_file` | Path to CePO configuration file | `None` |
501505
| `--cepo_use_plan_diversity` | Use additional plan diversity step | `False` |
@@ -584,6 +588,19 @@ Authorization: Bearer your_secret_api_key
584588

585589
¹ Numbers in parentheses for LongCePO indicate accuracy of majority voting from 5 runs.
586590

591+
### CePO on math and code benchmarks (Sep 2025)
592+
593+
| Method | AIME 2024 | AIME 2025 | GPQA | LiveCodeBench |
594+
| ----------------------: | :-------: | :-------: | :----: | :-----------: |
595+
| Qwen3 8B | 74.0 | 68.3 | 59.3 | 55.7 |
596+
| CePO (using Qwen3 8B) | 86.7 | 80.0 | 62.5 | 60.5 |
597+
| Qwen3 32B | 81.4 | 72.9 | 66.8 | 65.7 |
598+
| CePO (using Qwen3 32B) | **90.7** | **83.3** | 70.0 | **71.9** |
599+
| Qwen3 235B | 85.7 | 81.5 | 71.1 | 70.7 |
600+
| DeepSeek R1 | 79.8 | 70.0 | 71.5 | 64.3 |
601+
| OpenAI o3-mini | 79.6 | 74.8 | 76.8 | 66.3 |
602+
| Grok3 Think | 83.9 | 77.3 |**80.2**| 70.6 |
603+
587604
### CePO on math and code benchmarks (Mar 2025)
588605

589606
| Method | Math-L5 | MMLU-Pro (Math) | CRUX | LiveCodeBench (pass@1) | Simple QA |

optillm/cepo/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,19 @@ The model reviews all generated solution proposals and their associated plans, i
2323
**Step 4**: Final Solution
2424
The model uses the refined plan from Step 3 to produce the final answer.
2525

26+
## Example Usage
27+
28+
Here’s an example of running Optillm using the CePO method for Qwen3 deployed with VLLM on port 8001:
29+
30+
```bash
31+
OPENAI_API_KEY=serving-on-vllm \
32+
python optillm.py \
33+
--base-url http://localhost:8001/v1 \
34+
--approach cepo \
35+
--port 8000 \
36+
--cepo_config_file ./optillm/cepo/cepo_configs/cepo_qwen3.yaml
37+
```
38+
2639
## CePO Current Status
2740

2841
This project is a work in progress, and the provided code is in an early experimental stage. While the proposed approach works well across the benchmarks we tested, further improvements can be achieved by task-specific customizations to prompts.

0 commit comments

Comments
 (0)