Qwen2.5-1.5B-Instruct-lutmac
This repository contains ultra-low-bit quantized versions of Qwen2.5-1.5B-Instruct, optimized for the LutMac inference engine.
Available Variants
| Precision | Format | Size | Description |
|---|---|---|---|
| 8-bit | .lutmac |
1.49 GB | Standard Int8 quantization. |
| 6-bit | .lutmac |
1.18 GB | Int6 quantization for high efficiency. |
| 5-bit | .lutmac |
1.03 GB | Int5 quantization balanced weight. |
| 4-bit | .lutmac |
870 MB | Optimized 4-bit quantization with tied embeddings (8-bit). |
| 3-bit | .lutmac |
714 MB | Int3 quantization for memory-constrained devices. |
| 2-bit | .lutmac |
578 MB | 2-bit quantization using Hadamard Rotation and RRQ. |
| 1.58-bit | .lutmac |
578 MB | Ternary quantization {-1, 0, +1} (Sign-Magnitude encoding). |
| 1-bit | .lutmac |
402 MB | Binary quantization {-1, +1} (Purely bit-serial). |
How to Run Inference
To run these models, you need the LutMac engine installed. You can find the source code and build instructions at: https://github.com/YASSERRMD/lutmac
1. Build the Engine
git clone https://github.com/YASSERRMD/lutmac.git
cd lutmac
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j4
2. Run the Model
Download your preferred .lutmac file and the tokenizer.json from this repository.
./lutmac-inference \
--model ./qwen2.5-1.5b-instruct-4bit.lutmac \
--tokenizer ./tokenizer.json \
--prompt "What is the capital of France?" \
--max-tokens 100 \
--streaming
Quantization Details
These models were quantized using the bit-serial LUT engine methodology. Sub-4-bit models utilize Hadamard Rotation (FWHT) on both weights and activations to mitigate the impact of outliers, ensuring stability even at extreme compression rates.
- 4-bit and above: Symmetric integer quantization.
- Sub-4-bit: Recursive Residual Quantization (RRQ) combined with Incoherence Processing (QuIP-based rotation).
Experimental Project: This is part of ongoing research into ultra-low-bit CPU inference. Contributors and feedback are welcome at the main repository.
- Downloads last month
- 6