Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Paper
•
2311.03099
•
Published
•
30
Taking inspiration from Dampfinchen/Llama-3.1-8B-Ultra-Instruct and brucethemoose, the goal of this merge is to create an abliterated, conversational AI within 8B parameters that's coherent over long conversations. Using "Ultra-Instruct" as a baseline (which has problems with grammar and coherent conversations), preliminary results seem to show these goals are met. Expect responses in the Markdown format by default.
This model was merged using the DARE TIES merge method using meta-llama/Llama-3.1-8B-Instruct as a base.
The following models were included in the merge:
The following YAML configuration was used to produce this model:
base_model: meta-llama/Llama-3.1-8B-Instruct
dtype: bfloat16
merge_method: dare_ties
parameters:
int8_mask: 1.0
slices:
- sources:
- layer_range: [0, 32]
model: akjindal53244/Llama-3.1-Storm-8B
parameters:
density: 0.7
weight: 0.2
- layer_range: [0, 32]
model: arcee-ai/Llama-3.1-SuperNova-Lite
parameters:
density: 0.7
weight: 0.3
- layer_range: [0, 32]
model: Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2
parameters:
density: 0.7
weight: 0.5
- layer_range: [0, 32]
model: meta-llama/Llama-3.1-8B-Instruct
tokenizer_source: meta-llama/Llama-3.1-8B-Instruct
Detailed results can be found here!
| Metric | Value |
|---|---|
| Avg. | 29.59 |
| IFEval (0-Shot) | 79.41 |
| BBH (3-Shot) | 31.39 |
| MATH Lvl 5 (4-Shot) | 19.18 |
| GPQA (0-shot) | 6.82 |
| MuSR (0-shot) | 8.57 |
| MMLU-PRO (5-shot) | 32.14 |