Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference

Radmil · February 8, 2024, 4:16pm

Hello,

I have finetuned a model, with a PEFT configuration and I am trying to load them for inference.

I have read in this answer that loading the model that you have saved:

trainer.model.save_pretrained(new_model)

using the AutoModelForCausalLM:

model = AutoModelForCausalLM.from_pretrained(
    new_model,
    quantization_config=bnb_config,
    device_map=device_map
)

would load both the base_model + adapters.

However most of the tutorials suggest the use of merge_and_unload() when working with peft models:

base_model = AutoModelForCausalLM.from_pretrained(
    base_model,
    low_cpu_mem_usage=True,
    return_dict=True,
    device_map=device_map,
    quantization_config=bnb_config,
)

model = PeftModel.from_pretrained(model=base_model, 
                                  model_id=new_model, 
                                  quantization_config=bnb_config,
                                  device_map=device_map,)

model = model.merge_and_unload(safe_merge=True)

As far as I understand these 2 blocks of code should yield the same results but when I print the model structures they are nothing alike. Moreover, they yield different results when used for inference - the results of the latter resemble the output of the base model, whereas the results of the former look like the training data that I used.

xinchiqiu · March 16, 2024, 3:57pm

I have the same question, hope to get some answers too. Many thanks.

tprochenka · August 2, 2024, 1:57pm

Hi @Radmil have you find a solution for that? I have the same problem with merge_and_unload().

Topic		Replies	Views
Load_adapter vs from_pretrained Beginners	1	750	March 20, 2024
Help with merging LoRA weights back into base model :-) Beginners	11	66045	February 6, 2025
How to unload an adapter in PEFT? 🤗Accelerate	2	3452	January 15, 2024
Merged and Saved model not giving same result after loading Models	3	86	December 27, 2024
Direct Load vs. Base Model + LoRA: How Should You Use It? Models	1	104	March 12, 2025

Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference

Related topics