zacks917 commited on
Commit
043116b
·
verified ·
1 Parent(s): 98ddea7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +286 -0
README.md ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - zwhe99/DeepMath-103K
4
+ base_model:
5
+ - openai/gpt-oss-120b
6
+ ---
7
+ # AutoDeco
8
+ Official Implementation of "[The End of Manual Decoding: Towards Truly End-to-End Language Models](https://arxiv.org/abs/2510.26697)"
9
+
10
+ **AutoDeco** is a framework that adds token-level adaptive decoding parameter prediction capabilities to Large Language Models (LLMs). By adding lightweight prediction heads on top of pre-trained models, AutoDeco can dynamically predict optimal temperature and top-p parameters for each token during decoding.
11
+
12
+ ## 🎯 Key Features
13
+
14
+ - **Token-Level Decoding Parameter Prediction**: Dynamically predict decoding parameters (temperature and top-p) for each generated token
15
+ - **Lightweight Design**: Only adds two small MLP prediction heads (~5MB), without modifying the base model
16
+ - **Universal Architecture**: Supports multiple mainstream LLM architectures (Llama, Qwen2/2.5, Qwen3, MoE models, etc.)
17
+ - **End-to-End Training**: Complete training with implicit gradient backpropagation through cross-entropy loss only
18
+ - **Flexible Training**: Supports independent training of temperature head, top-p head, or joint training
19
+ - **Efficient Deployment**: Only saves AutoDeco prediction head weights during training, merges with base model during decoding.
20
+
21
+ ## 🏗️ Architecture
22
+
23
+ The AutoDeco framework consists of two core components:
24
+
25
+ ![AutoDeco Architecture](figure/arch.png)
26
+
27
+ ### Model Workflow
28
+
29
+ ```
30
+ Input Tokens
31
+
32
+ Base LLM (frozen during head training)
33
+
34
+ Hidden States
35
+ ├──→ LM Head → Logits
36
+ ├──→ TempHead → Temperature
37
+ └──→ TopPHead → Top-P
38
+ ```
39
+
40
+ During training, the base LLM parameters are frozen, and only the two prediction heads are trained.
41
+
42
+ ## 🤖 Supported Models
43
+
44
+ AutoDeco supports all current autoregressive LLMs, and we unified them with the following model architectures `AutoDecoModelForCausalLM` interface.
45
+
46
+
47
+
48
+ <div align="center">
49
+
50
+ | **Base Model** | **#Base Params** | **#AutoDeco Params** | **Download** |
51
+ | :------------: | :------------: | :------------: | :------------: |
52
+ | Llama-3.1-Nemotron-Nano-8B-v1 | 8B | 2.1M | [🤗 HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-Llama-Nemotron-8B) |
53
+ | DeepSeek-R1-Distill-Qwen-7B | 7B | 1.84M | [🤗 HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-R1-Distill-Qwen-7B) |
54
+ | Qwen3-30B-A3B-Instruct-2507 | 30B | 1.05M | [🤗 HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-Qwen3-30B-A3B-Instruct-2507) |
55
+ | OpenAI-GPT-OSS-20B | 20B | 1.48M | [🤗 HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-GPT-Oss-20B) |
56
+ | OpenAI-GPT-OSS-120B | 120B | 1.48M | [🤗 HuggingFace](https://huggingface.co/Jadeislaw/AutoDeco-GPT-Oss-120B) |
57
+ | Qwen3-235B-A22B-Thinking | 235B | 2.1M | [🤗 HuggingFace](https://huggingface.co/zacks917/AutoDeco-Qwen3-235B-A22B-Thinking-2507) |
58
+ | DeepSeek-V3.1-Terminus | 671B | - | Comming Soon |
59
+
60
+ </div>
61
+
62
+
63
+
64
+ ## 🚀 Installation
65
+
66
+ ### Recommended Requirements
67
+
68
+ - Python >= 3.10
69
+ - PyTorch >= 2.0
70
+ - CUDA >= 12.0 (recommended for training)
71
+
72
+ ### Install Dependencies
73
+
74
+ ```bash
75
+ # Clone repository
76
+ cd AutoDeco
77
+
78
+ # Install core dependencies
79
+ pip install -r requirements.txt
80
+
81
+ # Optional: for training monitoring
82
+ pip install wandb
83
+ ```
84
+
85
+ ## 💡 Quick Start
86
+
87
+ ### Initialize AutoDeco Model
88
+
89
+ ```python
90
+ python script/construct_autodeco.py \
91
+ --base_model_name_or_path path_to_your_base_LLM \
92
+ --output_dir path_to_your_AutoDeco_model
93
+ ```
94
+
95
+ <!-- ### 2. Inference
96
+
97
+ ```python
98
+ from transformers import AutoTokenizer
99
+
100
+ tokenizer = AutoTokenizer.from_pretrained("path/to/model")
101
+ inputs = tokenizer("What is the meaning of life?", return_tensors="pt")
102
+
103
+ # Forward pass to get predictions
104
+ outputs = model(**inputs)
105
+
106
+ # outputs contains:
107
+ # - outputs.logits: Regular language model logits
108
+ # - outputs.temp_logits: Predicted temperature values
109
+ # - outputs.top_p_logits: Predicted top-p values
110
+ ```
111
+
112
+ ### 3. Efficient Inference with vLLM
113
+
114
+ We have integrated AutoDeco with vLLM for efficient batch inference:
115
+
116
+ - Install vLLM from source code first
117
+ ```bash
118
+ cd vllm
119
+ pip install -e .
120
+ ```
121
+
122
+ - Inference
123
+ ```bash
124
+ # Use training script for evaluation
125
+ python llm_eval.py \
126
+ --model_name_or_path path/to/autodeco_model \
127
+ --dataset aime24 \
128
+ --temp 1.0 \
129
+ --top_p 1.0 \
130
+ --k 16 \
131
+ --tp_size 4
132
+ ``` -->
133
+
134
+ ## 🔥 Training
135
+
136
+ ### Prepare Training Data
137
+
138
+ Training data should be in JSONL format, with one sample per line. AutoDeco supports standard conversation format:
139
+
140
+
141
+ ```bash
142
+ {
143
+ "prompt": "formatted prompt text",
144
+ "completion": "expected completion"
145
+ }
146
+
147
+ # example
148
+ {
149
+ "prompt": "<|im_start|>user\nEvaluate the limit:$$\\lim_{(x, y) \\to (1, 2)} \\frac{(x-1)(y-2)-x+3}{x^2-2x+y^2-4}$$\nMake sure you output the final answer within \\boxed{}<|im_end|>\n< im_start>assistant\n",
150
+ "completion": "......### ✅ Final Answer:\n$$\n\\boxed{-1}\n$$""
151
+ }
152
+ ```
153
+
154
+ ### Train AutoDeco Heads
155
+
156
+ Use the provided training script:
157
+
158
+ ```bash
159
+ # Edit script/trl_train.sh to configure parameters
160
+ # Key parameters:
161
+ # - MODEL_NAME_OR_PATH: Your initialized AutoDeco Model Path
162
+ # - DATA_NAME: Training data filename (in data directory)
163
+ # - MAX_LENGTH: Maximum sequence length
164
+ # - train_temp: Whether to train temperature head
165
+ # - train_top_p: Whether to train top-p head
166
+
167
+ bash script/trl_train.sh
168
+ ```
169
+
170
+ Training configuration examples:
171
+
172
+ ```bash
173
+ # Train only temperature head
174
+ accelerate launch trl_train.py \
175
+ --model_name_or_path AutoDeco-Llama-3.1-8B \
176
+ --dataset_name train_data.jsonl \
177
+ --train_temp true \
178
+ --train_top_p false \
179
+ --learning_rate 5e-6 \
180
+ --num_train_epochs 1 \
181
+ --output_dir ckpt/llama3_temp_head
182
+ ```
183
+
184
+ ## 📊 Inference
185
+
186
+ ### Batch Evaluation with vLLM
187
+
188
+ ```bash
189
+ # Single evaluation
190
+ python llm_eval.py \
191
+ --model_name_or_path ckpt/autodeco_model \
192
+ --dataset aime24 \
193
+ --temp 1.0 \
194
+ --top_p 1.0 \
195
+ --k 16 \
196
+ --seed 42
197
+
198
+ # Batch evaluation with script (automatically generates multiple random seeds)
199
+ bash script/test_generation.sh aime24 1.0 1.0 -1 1.0 path/to/model
200
+ ```
201
+
202
+ Evaluation results are saved in the `generation_log/` directory, including:
203
+ - Pass@K metrics
204
+ - Average accuracy
205
+ - Detailed generation results for each sample
206
+
207
+ ### Deploy with vLLM
208
+ ```bash
209
+ # example
210
+ vllm serve
211
+ ```
212
+
213
+ ## 📁 Project Structure
214
+ ```
215
+ AutoDeco/
216
+ ├── model/ # Model definitions
217
+ │ ├── templlm_auto.py # Unified AutoDeco model (recommended)
218
+ definitions
219
+
220
+ ├── trainer/ # Trainers
221
+ │ └── trl_Temp.py # AutoDeco trainer
222
+
223
+ ├── script/ # Scripts
224
+ │ ├── trl_train.sh # Training launch script
225
+ │ ├── test_generation.sh # Batch evaluation script
226
+ │ └── merge_autodeco.py # Merge or split heads
227
+
228
+ ├── config/ # Configuration files
229
+ │ └── deepspeed/ # DeepSpeed configuration
230
+ │ └── deepspeed_zero3_gradaccu4.yaml
231
+
232
+ ├── trl_train.py # Training main program
233
+ ├── llm_eval.py # Evaluation main program (vLLM)
234
+ ├── boxed_extract.py # Answer extraction tool
235
+ ├── requirements.txt # requirements
236
+ └── README.md # This document
237
+
238
+ ```
239
+
240
+ ## 🔧 Advanced Usage
241
+
242
+ ### 1. Extract AutoDeco Heads from AutoDeco Model
243
+
244
+ ```python
245
+ python merge_autodeco.py split \
246
+ --full-checkpoint path_to_your_full_model \
247
+ --output path_to_split_head
248
+ ```
249
+
250
+ This generates a lightweight checkpoint (~5MB) containing:
251
+ - `config.json`: AutoDeco configuration (including base_model_name_or_path)
252
+ - `autodeco_heads.safetensors`: Heads weights
253
+
254
+ ### 2. Merge AutoDeco Heads to Base Model (for vLLM Deployment)
255
+
256
+ If you need to create a complete model file with heads for inference engines like vLLM:
257
+
258
+ ```python
259
+ python merge_autodeco.py merge \
260
+ --autodeco-path path_to_autodeco_heads \
261
+ --base-model-path path_to_base_LLM \
262
+ --output path_to_your_full_model
263
+ ```
264
+
265
+
266
+ ## 📝 Citation
267
+
268
+ If you use AutoDeco in your research, please cite:
269
+
270
+ ```bibtex
271
+ @misc{wang2025endmanualdecodingtruly,
272
+ title={The End of Manual Decoding: Towards Truly End-to-End Language Models},
273
+ author={Zhichao Wang and Dongyang Ma and Xinting Huang and Deng Cai and Tian Lan and Jiahao Xu and Haitao Mi and Xiaoying Tang and Yan Wang},
274
+ year={2025},
275
+ eprint={2510.26697},
276
+ archivePrefix={arXiv},
277
+ primaryClass={cs.CL},
278
+ url={https://arxiv.org/abs/2510.26697},
279
+ }
280
+ ```
281
+
282
+ <!-- ## Acknowledgments
283
+
284
+ - Built on [Transformers](https://github.com/huggingface/transformers) and [TRL](https://github.com/huggingface/trl)
285
+ - Training framework uses [DeepSpeed](https://github.com/microsoft/DeepSpeed)
286
+ - Inference optimization uses [vLLM](https://github.com/vllm-project/vllm) -->