Do I need to dequantization before merging the qlora

In this DPO trainer link

It says


As suggested by [Benjamin Marie](https://medium.com/@bnjmn_marie/dont-merge-your-lora-adapter-into-a-4-bit-llm-65b6da287997), the best option for merging QLoRA adapters is to first dequantize the base model, then merge the adapter. Something similar to [this script](https://github.com/jondurbin/qlora/blob/main/qmerge.py).

do we still need to do this merge? I feel the peft already integrated this feature?

I feel the peft already integrated this feature?

I use the latest version of PEFT on pip, not as new as the github version…
Maybe PEFT alone support this feature.
However, applying LoRA with quantization either doesn’t work, or seems to work, but causes errors during inference.
The error is different for diffusers as well as transformers, but almost the same error occurs.
I am not sure if this is due to insufficient support in the HF library, or bugs or compatibility issues with PEFT, torch, bitsandbytes, or other libraries, but I too recommend dequantizing and then merging under the current circumstances.

After I finetune the model with lora, and save the model (peftmodel),
then I run:

base_model = trainer.model.model
base_model_dequantized = qmerge.dequantize_model_2(model=base_model, dtype=torch.bfloat16)
peft_model = PeftModel.from_pretrained(base_model_dequantized, adapter_path)
merged_model = peft_model.merge_and_unload()
merged_model.save_pretrained(output_path, safe_serialization=False)

it shows this error when do the merging:

[rank0]:   File "/opt/miniconda/lib/python3.9/site-packages/peft/tuners/lora/bnb.py", line 344, in merge
[rank0]:     output = dequantize_bnb_weight(weight, state=weight.quant_state)
[rank0]: AttributeError: 'Parameter' object has no attribute 'quant_state'

I this this function in lora merge: peft/src/peft/tuners/lora/bnb.py at a0788a3f92c8220f68d2185aeef0266d6b725bfe · huggingface/peft · GitHub

There are several possible causes, but it seems that there are many bitsandbytes version problems. Not sure if the latest version would work…

pip install -U bitsandbytes

I think the reason is that in the function
base_model_dequantized = qmerge.dequantize_model_2(model=base_model, dtype=torch.bfloat16) it is using this function at link

So it is dequantized, but when I call merge_and_unload later, it seems trying to do dequantization again from the link above.

This is from this function

How about this?

base_model_dequantized = base_model.dequantize()

The reason is base_model = trainer.model.model will not get the base model, the base model is modified in place so it is a lora model now, so when I run
base_model_dequantized = qmerge.dequantize_model_2(model=base_model, dtype=torch.bfloat16) it is not using base model but lora model.

The base model is trainer.model.unload()

Could this be PEFT’s merge_and_unload() calling dequantize_model()?
If so, then dequantize would be executed in the following code.

base_model = trainer.model.model
#base_model_dequantized = qmerge.dequantize_model_2(model=base_model, dtype=torch.bfloat16)
peft_model = PeftModel.from_pretrained(base_model, adapter_path)
merged_model = peft_model.merge_and_unload()
merged_model.save_pretrained(output_path, safe_serialization=False)

The reason is that the trainer.model is still peft model. We can only get base model using unload method.

Hmmm, if that is the case, I wonder what is going on.
I think it might work anyway if we can stop the dequantize from being called twice…
I think this is a bug…
I wonder if the github version of PEFT could fix it?