I observed a similar issue and fixed it as below:
I used BitsAndBytesConfig:
bnb_config = BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_enable_fp32_cpu_offload=True,
)
And then created the model object as:
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
quantization_config=bnb_config,
device_map="cpu"
)
You can follow instructions from Installation Guide to install bitsandbytes on Intel CPUs. Below are the commands:
git clone --depth 1 -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
pip install intel_extension_for_pytorch
pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=cpu -S .
make
pip install -e . # `-e` for "editable" install, when developing BNB (otherwise leave that out)
Hope it helps!