|
| 1 | +# AutoThink |
| 2 | + |
| 3 | +AutoThink is an adaptive thinking approach for Large Language Models that combines query complexity classification with steering vector guidance to enhance model reasoning capabilities. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +AutoThink combines several advanced techniques to optimize the thinking process of LLMs: |
| 8 | + |
| 9 | +1. **Query Complexity Classification**: Uses an adaptive classifier to determine if a query requires HIGH or LOW complexity reasoning |
| 10 | +2. **Token Budget Allocation**: Dynamically allocates thinking tokens based on query complexity |
| 11 | +3. **Steering Vector Guidance**: Applies activation-based steering vectors to guide the model's reasoning process |
| 12 | +4. **Controlled Thinking Process**: Manages explicit thinking phases with start and end tokens |
| 13 | + |
| 14 | +## How It Works |
| 15 | + |
| 16 | +### 1. Query Classification |
| 17 | + |
| 18 | +AutoThink uses the `adaptive-classifier/llm-router` [model](https://huggingface.co/adaptive-classifier/llm-router) to classify incoming queries: |
| 19 | + |
| 20 | +- **HIGH**: Complex queries requiring deep reasoning, multi-step calculations, or thorough exploration |
| 21 | +- **LOW**: Simpler queries requiring less extensive reasoning |
| 22 | + |
| 23 | +### 2. Token Budget |
| 24 | + |
| 25 | +Based on the classification, AutoThink allocates different token budgets for the thinking phase: |
| 26 | + |
| 27 | +- **HIGH**: 70-90% of max tokens allocated for thinking |
| 28 | +- **LOW**: 20-40% of max tokens allocated for thinking |
| 29 | + |
| 30 | +### 3. Steering Vectors |
| 31 | + |
| 32 | +AutoThink uses pre-extracted steering vectors from [datasets](https://huggingface.co/datasets?other=pts) like `codelion/Qwen3-0.6B-pts-steering-vectors`. These vectors represent different reasoning patterns: |
| 33 | + |
| 34 | +- **Depth and thoroughness**: Encourages detailed, step-by-step reasoning |
| 35 | +- **Numerical accuracy**: Promotes precise calculations and verification |
| 36 | +- **Self-correction**: Facilitates error detection and correction |
| 37 | +- **Exploration**: Supports considering multiple approaches |
| 38 | +- **Organization**: Improves logical structure in responses |
| 39 | + |
| 40 | +During inference, the model's internal activations are modified based on these vectors to enhance specific reasoning capabilities. |
| 41 | + |
| 42 | +### 4. Controlled Thinking Process |
| 43 | + |
| 44 | +The generation process includes: |
| 45 | +1. A thinking phase marked by `<think>` and `</think>` tokens |
| 46 | +2. Automatic adjustment of thinking time based on query complexity |
| 47 | +3. Dynamic application of steering vectors |
| 48 | +4. Graceful transition to the final response |
| 49 | + |
| 50 | +## Configuration |
| 51 | + |
| 52 | +AutoThink can be configured with: |
| 53 | + |
| 54 | +```python |
| 55 | +{ |
| 56 | + "model_name": "your-model-name", |
| 57 | + "classifier_model": "adaptive-classifier/llm-router", |
| 58 | + "steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors", |
| 59 | + "target_layer": 19, # Layer to apply steering vectors |
| 60 | + "high_complexity_min_tokens": 1024, |
| 61 | + "high_complexity_max_tokens": 4096, |
| 62 | + "low_complexity_min_tokens": 256, |
| 63 | + "low_complexity_max_tokens": 1024, |
| 64 | + "pattern_strengths": { |
| 65 | + "depth_and_thoroughness": 2.5, # Steering strength for different patterns |
| 66 | + "numerical_accuracy": 2.0, |
| 67 | + "self_correction": 3.0, |
| 68 | + "exploration": 2.0, |
| 69 | + "organization": 1.5 |
| 70 | + } |
| 71 | +} |
| 72 | +``` |
| 73 | + |
| 74 | +## Usage |
| 75 | + |
| 76 | +```python |
| 77 | +from optillm.autothink import autothink_decode |
| 78 | + |
| 79 | +response = autothink_decode( |
| 80 | + model, |
| 81 | + tokenizer, |
| 82 | + messages, |
| 83 | + { |
| 84 | + "steering_dataset": "codelion/Qwen3-0.6B-pts-steering-vectors", |
| 85 | + "target_layer": 19 |
| 86 | + } |
| 87 | +) |
| 88 | +``` |
| 89 | + |
| 90 | +## Benefits |
| 91 | + |
| 92 | +- **Adaptive Resource Usage**: Models think more on complex problems and less on simple ones |
| 93 | +- **Enhanced Reasoning**: Steering vectors guide the model toward better reasoning patterns |
| 94 | +- **Efficiency**: Better performance without increasing model size |
| 95 | +- **Customizability**: Can be tailored for different domains using domain-specific steering vector datasets |
| 96 | + |
| 97 | + |
| 98 | +## Citation |
| 99 | + |
| 100 | +If you use this approach in your research, please cite: |
| 101 | + |
| 102 | +```bibtex |
| 103 | +@article{autothink, |
| 104 | + title={AutoThink: efficient inference for reasoning LLMs}, |
| 105 | + author={Sharma, Asankhaya}, |
| 106 | + journal={SSRN Artificial Intelligence eJournal}, |
| 107 | + year={2025}, |
| 108 | + url = {https://dx.doi.org/10.2139/ssrn.5253327} |
| 109 | +} |
| 110 | +``` |
0 commit comments