Help with merging LoRA weights back into base model :-)

As best as I can tell, the LoraModel merge_and_unload attribute (peft/ at main · huggingface/peft · GitHub) merges LoRA weights back into the main model.

However, I am having trouble getting a LoraModel type from my PeftModelForCausalLM. My current workflow is to define a pretrained model, define a LoraConfig, and use the get_peft_model function to being training. This works great, but I want to be able to merge the weights back into the base model and save.

My working assumption is that I need to either convert my PeftModelForCausalLM into a LoraModel or initialize the model as a LoraModel prior to training. However, when I copy the example in the LoraModel docstring (peft/ at main · huggingface/peft · GitHub), I get an TypeError (TypeError: LoraModel.init() missing 1 required positional argument: ‘adapter_name’). When I try passing a “lora” as a adapter name, I get another error.

I think that I am fundamentally thinking about this in the wrong way and would love some pointers. Both Google and Copilot chat have not been able to solve my problem.


I figured this out. The solution is quite simple.

A PeftModelForCausalLM actually inherits the LoraModel methods, so you can call merged_model = merged.merge_and_unload() to get back a base model with the LoRA weights applied.

My IDE would not autocomplete merge_and_upload, so I assumed the method wasn’t available. I still don’t need in the code where this method is inherited and would love for someone to point this out to me if feeling charitable.


Looking to get a model file from a base + lora myself. Can you explain in more detail how you were able to do it?

Try this. Basic steps are to:
1/ load the base model
2/ train the base model
3/ save the LoRA adapter
4/ reload the base model at half/full precision
5/ merge the LoRA weights with the base model
6/ save

base_model = AutoModelForCausalLM.from_pretrained(“base_model”, load_in_8bit=True, torch_dtype=torch.float16, device_map=“auto”)

base_model = prepare_model_for_int8_training(base_model)

peft_model = get_peft_model(base_model, peft_config)

training_args = TrainingArguments()
trainer = Trainer()

peft_model.save_pretrained(lora_adapter, save_adapter=True, save_config=True)

model_to_merge = PeftModel.from_pretrained(AutoModelForCausalLM.from_pretrained(base_model).to(“cuda”), lora_adapter)

merged_model = model_to_merge.merge_and_unload()


These are the correct steps to create a model version form a base model and my train?

A related question, why the model size in disk is almost doubling after merging even when the number of parameters remaining the same?


But when I try to save the model weights on model_to_merge then I still get only the adapter safetensors and not the safetensors for the entire model.
How do i get that?

I have the same question, why the size difference in the base model and merged model?

1 Like

I see 2 potential reasons for that, unsure if any of them is applicable to your use case as I cannot see your code.

  1. The merged model (base model + LoRA adapters) has the numer of parameters of the base model + the number of parameters of the inserted LoRA adapters. For this reason,# params merged_model > # params base_model, therefore the increase in size

  2. When reloading the model, make sure to provide the same data type as in training to ensure the same size is mantained. For example, if the trained model was loaded in half precision, also the model_to_merge should be loaded in half precision to enforce comparable size e.g:

model_to_train = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16) 
model_to_merge = PeftModel.from_pretrained(AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16), lora_adapter)

In the example provided by @dmckinno, the model_to_train is loaded in 8 bits (load_in_8bit=True), and then model_to_merge is loaded in full precision, since no parameter is provided and the default will be used (float32)

1 Like


I explain it in more detail : Config.json is not saving after finetuning Llama 2 - #6 by nielsr, hope you find it useful!

And this is a bit related: Further finetuning a LoRA finetuned CausalLM Model - #4 by nielsr