Skip to content

Commit 6096253

Browse files
authored
Merge pull request #184 from codelion/feat-autothink-decoding
Feat autothink decoding
2 parents 2ab4e6e + ce9277a commit 6096253

File tree

12 files changed

+1808
-3
lines changed

12 files changed

+1808
-3
lines changed

README.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,7 +343,7 @@ Check this log file for connection issues, tool execution errors, and other diag
343343

344344
| Approach | Slug | Description |
345345
| ------------------------------------ | ------------------ | ---------------------------------------------------------------------------------------------- |
346-
| Cerebras Planning and Optimization | `cepo` | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
346+
| Cerebras Planning and Optimization | `cepo` | Combines Best of N, Chain-of-Thought, Self-Reflection, Self-Improvement, and various prompting techniques |
347347
| CoT with Reflection | `cot_reflection` | Implements chain-of-thought reasoning with \<thinking\>, \<reflection> and \<output\> sections |
348348
| PlanSearch | `plansearch` | Implements a search algorithm over candidate plans for solving a problem in natural language |
349349
| ReRead | `re2` | Implements rereading to improve reasoning by processing queries twice |
@@ -359,6 +359,7 @@ Check this log file for connection issues, tool execution errors, and other diag
359359
| CoT Decoding | N/A for proxy | Implements chain-of-thought decoding to elicit reasoning without explicit prompting |
360360
| Entropy Decoding | N/A for proxy | Implements adaptive sampling based on the uncertainty of tokens during generation |
361361
| Thinkdeeper | N/A for proxy | Implements the `reasoning_effort` param from OpenAI for reasoning models like DeepSeek R1 |
362+
| AutoThink | N/A for proxy | Combines query complexity classification with steering vectors to enhance reasoning |
362363

363364
## Implemented plugins
364365

@@ -467,6 +468,16 @@ Authorization: Bearer your_secret_api_key
467468

468469
## SOTA results on benchmarks with optillm
469470

471+
### AutoThink on GPQA-Diamond & MMLU-Pro (May 2025)
472+
473+
| **Model** | **GPQA-Diamond** | | **MMLU-Pro** | |
474+
|----------------|-----------------------------|--------------------------|----------------------------|--------------------------|
475+
| | Accuracy (%) | Avg. Tokens | Accuracy (%) | Avg. Tokens |
476+
| DeepSeek-R1-Distill-Qwen-1.5B | 21.72 | 7868.26 | 25.58 | 2842.75 |
477+
| with Fixed Budget | 28.47 | 3570.00 | 26.18 | 1815.67 |
478+
| **with AutoThink** | **31.06** | **3520.52** | **26.38** | **1792.50** |
479+
480+
470481
### LongCePO on LongBench v2 (Apr 2025)
471482

472483
| Model¹ | Context window | Short samples (up to 32K words) | Medium samples (32–128K words) |
@@ -551,6 +562,7 @@ called patchflows. We saw huge performance gains across all the supported patchf
551562
![Results showing optillm mixture of agents approach used with patchflows](https://raw.githubusercontent.com/codelion/optillm/main/moa-patchwork-results.png)
552563

553564
## References
565+
- [AutoThink: efficient inference for reasoning LLMs](https://dx.doi.org/10.2139/ssrn.5253327) - [Implementation](optillm/autothink)
554566
- [CePO: Empowering Llama with Reasoning using Test-Time Compute](https://cerebras.ai/blog/cepo) - [Implementation](optillm/cepo)
555567
- [LongCePO: Empowering LLMs to efficiently leverage infinite context](https://cerebras.ai/blog/longcepo) - [Implementation](optillm/plugins/longcepo)
556568
- [Chain of Code: Reasoning with a Language Model-Augmented Code Emulator](https://arxiv.org/abs/2312.04474) - [Inspired the implementation of coc plugin](optillm/plugins/coc_plugin.py)

optillm/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import os
33

44
# Version information
5-
__version__ = "0.1.11"
5+
__version__ = "0.1.12"
66

77
# Get the path to the root optillm.py
88
spec = util.spec_from_file_location(

optillm/autothink/README.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# AutoThink
2+
3+
AutoThink is an adaptive thinking approach for Large Language Models that combines query complexity classification with steering vector guidance to enhance model reasoning capabilities.
4+
5+
## Overview
6+
7+
AutoThink combines several advanced techniques to optimize the thinking process of LLMs:
8+
9+
1. **Query Complexity Classification**: Uses an adaptive classifier to determine if a query requires HIGH or LOW complexity reasoning
10+
2. **Token Budget Allocation**: Dynamically allocates thinking tokens based on query complexity
11+
3. **Steering Vector Guidance**: Applies activation-based steering vectors to guide the model's reasoning process
12+
4. **Controlled Thinking Process**: Manages explicit thinking phases with start and end tokens
13+
14+
## How It Works
15+
16+
### 1. Query Classification
17+
18+
AutoThink uses the `adaptive-classifier/llm-router` [model](https://huggingface.co/adaptive-classifier/llm-router) to classify incoming queries:
19+
20+
- **HIGH**: Complex queries requiring deep reasoning, multi-step calculations, or thorough exploration
21+
- **LOW**: Simpler queries requiring less extensive reasoning
22+
23+
### 2. Token Budget
24+
25+
Based on the classification, AutoThink allocates different token budgets for the thinking phase:
26+
27+
- **HIGH**: 70-90% of max tokens allocated for thinking
28+
- **LOW**: 20-40% of max tokens allocated for thinking
29+
30+
### 3. Steering Vectors
31+
32+
AutoThink uses pre-extracted steering vectors from [datasets](https://huggingface.co/datasets?other=pts) like `codelion/Qwen3-0.6B-pts-steering-vectors`. These vectors represent different reasoning patterns:
33+
34+
- **Depth and thoroughness**: Encourages detailed, step-by-step reasoning
35+
- **Numerical accuracy**: Promotes precise calculations and verification
36+
- **Self-correction**: Facilitates error detection and correction
37+
- **Exploration**: Supports considering multiple approaches
38+
- **Organization**: Improves logical structure in responses
39+
40+
During inference, the model's internal activations are modified based on these vectors to enhance specific reasoning capabilities.
41+
42+
### 4. Controlled Thinking Process
43+
44+
The generation process includes:
45+
1. A thinking phase marked by `<think>` and `</think>` tokens
46+
2. Automatic adjustment of thinking time based on query complexity
47+
3. Dynamic application of steering vectors
48+
4. Graceful transition to the final response
49+
50+
## Configuration
51+
52+
AutoThink can be configured with:
53+
54+
```python
55+
{
56+
"model_name": "your-model-name",
57+
"classifier_model": "adaptive-classifier/llm-router",
58+
"steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
59+
"target_layer": 19, # Layer to apply steering vectors
60+
"high_complexity_min_tokens": 1024,
61+
"high_complexity_max_tokens": 4096,
62+
"low_complexity_min_tokens": 256,
63+
"low_complexity_max_tokens": 1024,
64+
"pattern_strengths": {
65+
"depth_and_thoroughness": 2.5, # Steering strength for different patterns
66+
"numerical_accuracy": 2.0,
67+
"self_correction": 3.0,
68+
"exploration": 2.0,
69+
"organization": 1.5
70+
}
71+
}
72+
```
73+
74+
## Usage
75+
76+
```python
77+
from optillm.autothink import autothink_decode
78+
79+
response = autothink_decode(
80+
model,
81+
tokenizer,
82+
messages,
83+
{
84+
"steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors",
85+
"target_layer": 19
86+
}
87+
)
88+
```
89+
90+
## Benefits
91+
92+
- **Adaptive Resource Usage**: Models think more on complex problems and less on simple ones
93+
- **Enhanced Reasoning**: Steering vectors guide the model toward better reasoning patterns
94+
- **Efficiency**: Better performance without increasing model size
95+
- **Customizability**: Can be tailored for different domains using domain-specific steering vector datasets
96+
97+
98+
## Citation
99+
100+
If you use this approach in your research, please cite:
101+
102+
```bibtex
103+
@article{autothink,
104+
title={AutoThink: efficient inference for reasoning LLMs},
105+
author={Sharma, Asankhaya},
106+
journal={SSRN Artificial Intelligence eJournal},
107+
year={2025},
108+
url = {https://dx.doi.org/10.2139/ssrn.5253327}
109+
}
110+
```

optillm/autothink/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""
2+
AutoThink - Adaptive thinking approach for LLMs with query complexity classification and steering vectors.
3+
"""
4+
5+
from .autothink import autothink_decode, AutoThinkProcessor
6+
7+
__all__ = ["autothink_decode", "AutoThinkProcessor"]

optillm/autothink/autothink.py

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
"""
2+
AutoThink main implementation.
3+
4+
This module provides the main implementation of AutoThink, combining
5+
query complexity classification with steering vectors to enhance reasoning.
6+
"""
7+
8+
import logging
9+
from typing import Dict, List, Any, Optional
10+
from transformers import PreTrainedModel, PreTrainedTokenizer
11+
12+
from .processor import AutoThinkProcessor as InternalProcessor
13+
14+
logger = logging.getLogger(__name__)
15+
16+
class AutoThinkProcessor:
17+
"""
18+
Main AutoThink processor class for external use.
19+
Wraps the internal processor implementation.
20+
"""
21+
22+
def __init__(self, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, config: Dict[str, Any] = None):
23+
"""
24+
Initialize the AutoThink processor.
25+
26+
Args:
27+
model: Language model
28+
tokenizer: Model tokenizer
29+
config: Configuration dictionary
30+
"""
31+
self.config = config or {}
32+
self.processor = None
33+
self.model = model
34+
self.tokenizer = tokenizer
35+
36+
def __call__(self, messages: List[Dict[str, str]]) -> str:
37+
"""Process messages with AutoThink's controlled thinking."""
38+
return self.process(messages)
39+
40+
def process(self, messages: List[Dict[str, str]]) -> str:
41+
"""Process messages with AutoThink's controlled thinking.
42+
43+
Args:
44+
messages: List of message dictionaries
45+
46+
Returns:
47+
Generated response
48+
"""
49+
# Create processor on first use to allow for model loading
50+
if self.processor is None:
51+
self.processor = self._create_processor()
52+
53+
return self.processor.process(messages)
54+
55+
def _create_processor(self):
56+
"""Create the internal processor instance."""
57+
return InternalProcessor(self.config, self.tokenizer, self.model)
58+
59+
def autothink_decode(
60+
model: PreTrainedModel,
61+
tokenizer: PreTrainedTokenizer,
62+
messages: List[Dict[str, str]],
63+
request_config: Optional[Dict[str, Any]] = None
64+
) -> str:
65+
"""
66+
Main plugin execution function with AutoThink's controlled thinking process.
67+
68+
Args:
69+
model: Language model
70+
tokenizer: Model tokenizer
71+
messages: List of message dictionaries
72+
request_config: Optional configuration dictionary
73+
74+
Returns:
75+
Generated response with thinking process
76+
"""
77+
logger.info("Starting AutoThink processing")
78+
79+
# Create config dictionary
80+
config = {}
81+
if request_config:
82+
config.update(request_config)
83+
84+
try:
85+
processor = AutoThinkProcessor(model, tokenizer, config)
86+
response = processor.process(messages)
87+
return response
88+
89+
except Exception as e:
90+
logger.error(f"Error in AutoThink processing: {str(e)}")
91+
raise

0 commit comments

Comments
 (0)