Do I need to dequantization before merging the qlora

woshizouguo · October 4, 2024, 10:50pm

In this DPO trainer link

It says


As suggested by [Benjamin Marie](https://medium.com/@bnjmn_marie/dont-merge-your-lora-adapter-into-a-4-bit-llm-65b6da287997), the best option for merging QLoRA adapters is to first dequantize the base model, then merge the adapter. Something similar to [this script](https://github.com/jondurbin/qlora/blob/main/qmerge.py).

do we still need to do this merge? I feel the peft already integrated this feature?

John6666 · October 4, 2024, 11:36pm

I feel the peft already integrated this feature?

I use the latest version of PEFT on pip, not as new as the github version…
Maybe PEFT alone support this feature.
However, applying LoRA with quantization either doesn’t work, or seems to work, but causes errors during inference.
The error is different for diffusers as well as transformers, but almost the same error occurs.
I am not sure if this is due to insufficient support in the HF library, or bugs or compatibility issues with PEFT, torch, bitsandbytes, or other libraries, but I too recommend dequantizing and then merging under the current circumstances.

woshizouguo · October 5, 2024, 12:16am

After I finetune the model with lora, and save the model (peftmodel),
then I run:

base_model = trainer.model.model
base_model_dequantized = qmerge.dequantize_model_2(model=base_model, dtype=torch.bfloat16)
peft_model = PeftModel.from_pretrained(base_model_dequantized, adapter_path)
merged_model = peft_model.merge_and_unload()
merged_model.save_pretrained(output_path, safe_serialization=False)

it shows this error when do the merging:

[rank0]:   File "/opt/miniconda/lib/python3.9/site-packages/peft/tuners/lora/bnb.py", line 344, in merge
[rank0]:     output = dequantize_bnb_weight(weight, state=weight.quant_state)
[rank0]: AttributeError: 'Parameter' object has no attribute 'quant_state'

woshizouguo · October 5, 2024, 12:24am

I this this function in lora merge: peft/src/peft/tuners/lora/bnb.py at a0788a3f92c8220f68d2185aeef0266d6b725bfe · huggingface/peft · GitHub

John6666 · October 5, 2024, 12:30am

There are several possible causes, but it seems that there are many bitsandbytes version problems. Not sure if the latest version would work…

pip install -U bitsandbytes

woshizouguo · October 5, 2024, 1:43am

I think the reason is that in the function
base_model_dequantized = qmerge.dequantize_model_2(model=base_model, dtype=torch.bfloat16) it is using this function at link

So it is dequantized, but when I call merge_and_unload later, it seems trying to do dequantization again from the link above.

This is from this function

John6666 · October 5, 2024, 3:57am

How about this?

base_model_dequantized = base_model.dequantize()

woshizouguo · October 6, 2024, 7:14am

The reason is base_model = trainer.model.model will not get the base model, the base model is modified in place so it is a lora model now, so when I run
base_model_dequantized = qmerge.dequantize_model_2(model=base_model, dtype=torch.bfloat16) it is not using base model but lora model.

The base model is trainer.model.unload()

John6666 · October 6, 2024, 9:24am

Could this be PEFT’s merge_and_unload() calling dequantize_model()?
If so, then dequantize would be executed in the following code.

base_model = trainer.model.model
#base_model_dequantized = qmerge.dequantize_model_2(model=base_model, dtype=torch.bfloat16)
peft_model = PeftModel.from_pretrained(base_model, adapter_path)
merged_model = peft_model.merge_and_unload()
merged_model.save_pretrained(output_path, safe_serialization=False)

woshizouguo · October 8, 2024, 5:21pm

The reason is that the trainer.model is still peft model. We can only get base model using unload method.

John6666 · October 9, 2024, 4:27am

Hmmm, if that is the case, I wonder what is going on.
I think it might work anyway if we can stop the dequantize from being called twice…
I think this is a bug…
I wonder if the github version of PEFT could fix it?

Topic		Replies	Views
Inference after QLoRA fine-tuning Intermediate	8	6250	June 7, 2024
LLM2VEC QLora Quantization after merge_and_upload() Beginners	0	133	July 25, 2024
Error. Model cannot be quantized if a LoRA adapter has been applied to it via merge_and_unload() Beginners	0	300	May 12, 2024
`get_peft_model` or `model.add_adapter` Beginners	2	1194	February 17, 2025
I wonder how to merge my PEFT adapter with the base model and finally get a new whole model? 🤗Transformers	27	969	February 7, 2025

Do I need to dequantization before merging the qlora

Related topics