Error. Model cannot be quantized if a LoRA adapter has been applied to it via merge_and_unload()

iamz80 · May 12, 2024, 7:03am

Hello!

Model quantization attempt to 4-bit to which a QLoRA adapter has been previously applied causes an error? See the code below.

I consistently encounter raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")

model = AutoModelForCausalLM.from_pretrained( "./base_model", load_in_4bit=True )
model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig( r=32, lora_alpha=32, target_modules= ['k_proj', 'q_proj', 'v_proj'] )
# ......
trainer.train()

# Apply the adapter to the original (non-quantized) model.
model = AutoModelForCausalLM.from_pretrained( "./base_model" )
model = PeftModel.from_pretrained( model, "./output/checkpoint-10/" )
model = model.merge_and_unload()
model.save_pretrained( "./merged" )

# This line causes the error to occur 
# ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")
# How do I overcome this?
model = AutoModelForCausalLM.from_pretrained( "./merged", load_in_4bit=True )

Topic		Replies	Views
Loading an LoRA adapter trained on quantized model on a non-quantized model Intermediate	0	1374	November 7, 2023
Peft model from pretrained load in 8/4 bit 🤗Transformers	6	17524	October 12, 2023
Size Mismatch when loading Lora Adapter for Phi3 Intermediate	0	223	July 30, 2024
Bitsandbytes quantization and QLORA fine-tuning 🤗Transformers	1	271	November 5, 2024
"You cannot perform fine-tuning on purely quantized models." error in LoRA model training? 🤗Transformers	3	2623	August 16, 2024

Error. Model cannot be quantized if a LoRA adapter has been applied to it via merge_and_unload()

Related topics