Qwen2.5-1.5B-Instruct-lutmac

This repository contains ultra-low-bit quantized versions of Qwen2.5-1.5B-Instruct, optimized for the LutMac inference engine.

Available Variants

Precision	Format	Size	Description
8-bit	`.lutmac`	1.49 GB	Standard Int8 quantization.
6-bit	`.lutmac`	1.18 GB	Int6 quantization for high efficiency.
5-bit	`.lutmac`	1.03 GB	Int5 quantization balanced weight.
4-bit	`.lutmac`	870 MB	Optimized 4-bit quantization with tied embeddings (8-bit).
3-bit	`.lutmac`	714 MB	Int3 quantization for memory-constrained devices.
2-bit	`.lutmac`	578 MB	2-bit quantization using Hadamard Rotation and RRQ.
1.58-bit	`.lutmac`	578 MB	Ternary quantization {-1, 0, +1} (Sign-Magnitude encoding).
1-bit	`.lutmac`	402 MB	Binary quantization {-1, +1} (Purely bit-serial).

How to Run Inference

To run these models, you need the LutMac engine installed. You can find the source code and build instructions at: https://github.com/YASSERRMD/lutmac

1. Build the Engine

git clone https://github.com/YASSERRMD/lutmac.git
cd lutmac
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j4

2. Run the Model

Download your preferred .lutmac file and the tokenizer.json from this repository.

./lutmac-inference \
    --model ./qwen2.5-1.5b-instruct-4bit.lutmac \
    --tokenizer ./tokenizer.json \
    --prompt "What is the capital of France?" \
    --max-tokens 100 \
    --streaming

Quantization Details

These models were quantized using the bit-serial LUT engine methodology. Sub-4-bit models utilize Hadamard Rotation (FWHT) on both weights and activations to mitigate the impact of outliers, ensuring stability even at extreme compression rates.

4-bit and above: Symmetric integer quantization.
Sub-4-bit: Recursive Residual Quantization (RRQ) combined with Incoherence Processing (QuIP-based rotation).

Experimental Project: This is part of ongoing research into ultra-low-bit CPU inference. Contributors and feedback are welcome at the main repository.

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yasserrmd/Qwen2.5-1.5B-Instruct-lutmac

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1387)

this model