Qwen2.5-1.5B-Instruct-lutmac

This repository contains ultra-low-bit quantized versions of Qwen2.5-1.5B-Instruct, optimized for the LutMac inference engine.

Available Variants

Precision Format Size Description
8-bit .lutmac 1.49 GB Standard Int8 quantization.
6-bit .lutmac 1.18 GB Int6 quantization for high efficiency.
5-bit .lutmac 1.03 GB Int5 quantization balanced weight.
4-bit .lutmac 870 MB Optimized 4-bit quantization with tied embeddings (8-bit).
3-bit .lutmac 714 MB Int3 quantization for memory-constrained devices.
2-bit .lutmac 578 MB 2-bit quantization using Hadamard Rotation and RRQ.
1.58-bit .lutmac 578 MB Ternary quantization {-1, 0, +1} (Sign-Magnitude encoding).
1-bit .lutmac 402 MB Binary quantization {-1, +1} (Purely bit-serial).

How to Run Inference

To run these models, you need the LutMac engine installed. You can find the source code and build instructions at: https://github.com/YASSERRMD/lutmac

1. Build the Engine

git clone https://github.com/YASSERRMD/lutmac.git
cd lutmac
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j4

2. Run the Model

Download your preferred .lutmac file and the tokenizer.json from this repository.

./lutmac-inference \
    --model ./qwen2.5-1.5b-instruct-4bit.lutmac \
    --tokenizer ./tokenizer.json \
    --prompt "What is the capital of France?" \
    --max-tokens 100 \
    --streaming

Quantization Details

These models were quantized using the bit-serial LUT engine methodology. Sub-4-bit models utilize Hadamard Rotation (FWHT) on both weights and activations to mitigate the impact of outliers, ensuring stability even at extreme compression rates.

  • 4-bit and above: Symmetric integer quantization.
  • Sub-4-bit: Recursive Residual Quantization (RRQ) combined with Incoherence Processing (QuIP-based rotation).

Experimental Project: This is part of ongoing research into ultra-low-bit CPU inference. Contributors and feedback are welcome at the main repository.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yasserrmd/Qwen2.5-1.5B-Instruct-lutmac

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1387)
this model