Error. Model cannot be quantized if a LoRA adapter has been applied to it via merge_and_unload()

Hello!

Model quantization attempt to 4-bit to which a QLoRA adapter has been previously applied causes an error? See the code below.

I consistently encounter raise ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")

model = AutoModelForCausalLM.from_pretrained( "./base_model", load_in_4bit=True )
model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig( r=32, lora_alpha=32, target_modules= ['k_proj', 'q_proj', 'v_proj'] )
# ......
trainer.train()

# Apply the adapter to the original (non-quantized) model.
model = AutoModelForCausalLM.from_pretrained( "./base_model" )
model = PeftModel.from_pretrained( model, "./output/checkpoint-10/" )
model = model.merge_and_unload()
model.save_pretrained( "./merged" )

# This line causes the error to occur 
# ValueError(f"Blockwise quantization only supports 16/32-bit floats, but got {A.dtype}")
# How do I overcome this?
model = AutoModelForCausalLM.from_pretrained( "./merged", load_in_4bit=True )