SmolVLM 8bit Quantization Problem

I quantize the HuggingFaceTB/SmolVLM-Instruct using the code below.(i.e. at the model’s page)

from transformers import AutoModelForVision2Seq, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForVision2Seq.from_pretrained(
“HuggingFaceTB/SmolVLM-Instruct”,
quantization_config=quantization_config,
)

Local inference after quantization works fine. As far as I understand something is happening during serialization, but upload to HF works also fine ,but when I try to download from my repository

uisikdag/SmolVLM-Instruct-8bit

in huggingface, and run the quantized model,I get an error regarding unmatched tensor sizes.Help will be appreciated.

Edit:This issue does not occur in 4bit quantization with nf4.That works totally fine.

1 Like

It was also reproduced beautifully here. It seems to be an issue with bitsandbytes that has been unresolved for a long time. The fact that it occurred in an official HF model may be a chance for a solution.:sweat_smile:

Edit:
It seems that this is a different problem. It’s worse than the one above.

Edit:
Maybe this issue.

1 Like

I’ve tried going back as far as possible in the library version, but I can’t avoid the error. I hope I’m just making an easy mistake…:sleepy:

from transformers import AutoProcessor, AutoModelForVision2Seq, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
#quantization_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16)

temp_model_dir = "temp_model"
model_id = "HuggingFaceTB/SmolVLM-Instruct"
#model_id = "uisikdag/SmolVLM-Instruct-8bit" # bnb quantized
#model_id = "Salesforce/blip2-opt-2.7b" # for reproduction in older version transformers
temp_model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.bfloat16, quantization_config=quantization_config)
temp_model.save_pretrained(temp_model_dir)
#temp_model_dir = model_id # if this is enabled, it will not crash
model = AutoModelForVision2Seq.from_pretrained(temp_model_dir, quantization_config=quantization_config) # crashes here
#processor = AutoProcessor.from_pretrained(temp_model_dir)

# dependencies
"""
torch==2.4.0
accelerate==1.0.0
huggingface_hub==0.26.0
transformers==4.46.0
bitsandbytes==0.44.0
numpy<2
peft==0.12.0
safetensors==0.4.3
"""

thx for trying !

1 Like