Loading quantized model on CPU only

umakantk · February 3, 2025, 4:22am

I observed a similar issue and fixed it as below:

I used BitsAndBytesConfig:

bnb_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_enable_fp32_cpu_offload=True,
)

And then created the model object as:

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="cpu"
)

You can follow instructions from Installation Guide to install bitsandbytes on Intel CPUs. Below are the commands:

git clone --depth 1 -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
pip install intel_extension_for_pytorch
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=cpu -S .
make
pip install -e .   # `-e` for "editable" install, when developing BNB (otherwise leave that out)

Hope it helps!

Topic		Replies	Views
How to load quantized LLM to CPU only device Intermediate	0	1995	January 28, 2024
Load quantized model in memory Beginners	1	613	December 8, 2023
SmolVLM 8bit Quantization Problem Models	3	646	November 29, 2024
An error i ve been trying to fix for days now Intermediate	4	579	November 19, 2024
"normal_kernel_cpu" not implemented for 'Char' when trying to import 8-bit model 🤗Transformers	6	1966	February 23, 2025

Loading quantized model on CPU only

Related topics