Help with merging LoRA weights back into base model :-)

dmckinno · May 25, 2023, 2:04pm

As best as I can tell, the LoraModel merge_and_unload attribute (peft/lora.py at main · huggingface/peft · GitHub) merges LoRA weights back into the main model.

However, I am having trouble getting a LoraModel type from my PeftModelForCausalLM. My current workflow is to define a pretrained model, define a LoraConfig, and use the get_peft_model function to being training. This works great, but I want to be able to merge the weights back into the base model and save.

My working assumption is that I need to either convert my PeftModelForCausalLM into a LoraModel or initialize the model as a LoraModel prior to training. However, when I copy the example in the LoraModel docstring (peft/lora.py at main · huggingface/peft · GitHub), I get an TypeError (TypeError: LoraModel.init() missing 1 required positional argument: ‘adapter_name’). When I try passing a “lora” as a adapter name, I get another error.

I think that I am fundamentally thinking about this in the wrong way and would love some pointers. Both Google and Copilot chat have not been able to solve my problem.

dmckinno · May 25, 2023, 10:10pm

I figured this out. The solution is quite simple.

A PeftModelForCausalLM actually inherits the LoraModel methods, so you can call merged_model = merged.merge_and_unload() to get back a base model with the LoRA weights applied.

My IDE would not autocomplete merge_and_upload, so I assumed the method wasn’t available. I still don’t need in the code where this method is inherited and would love for someone to point this out to me if feeling charitable.

Astropulse · June 20, 2023, 4:45pm

Looking to get a model file from a base + lora myself. Can you explain in more detail how you were able to do it?

dmckinno · June 22, 2023, 1:48am

Try this. Basic steps are to:
1/ load the base model
2/ train the base model
3/ save the LoRA adapter
4/ reload the base model at half/full precision
5/ merge the LoRA weights with the base model
6/ save

base_model = AutoModelForCausalLM.from_pretrained(“base_model”, load_in_8bit=True, torch_dtype=torch.float16, device_map=“auto”)

base_model = prepare_model_for_int8_training(base_model)

peft_model = get_peft_model(base_model, peft_config)

training_args = TrainingArguments()
trainer = Trainer()
trainer.train()

peft_model.save_pretrained(lora_adapter, save_adapter=True, save_config=True)

model_to_merge = PeftModel.from_pretrained(AutoModelForCausalLM.from_pretrained(base_model).to(“cuda”), lora_adapter)

merged_model = model_to_merge.merge_and_unload()
merged_model.save_pretrained(merged_model)

juaramos · July 22, 2023, 7:42am

These are the correct steps to create a model version form a base model and my train?

ssm1990 · September 15, 2023, 1:17pm

A related question, why the model size in disk is almost doubling after merging even when the number of parameters remaining the same?

accOne996795 · January 21, 2024, 5:32am

But when I try to save the model weights on model_to_merge then I still get only the adapter safetensors and not the safetensors for the entire model.
How do i get that?

Abhinav28 · January 31, 2024, 9:25am

I have the same question, why the size difference in the base model and merged model?

ferrazzipietro · April 29, 2024, 2:48pm

I see 2 potential reasons for that, unsure if any of them is applicable to your use case as I cannot see your code.

The merged model (base model + LoRA adapters) has the numer of parameters of the base model + the number of parameters of the inserted LoRA adapters. For this reason,# params merged_model > # params base_model, therefore the increase in size
When reloading the model, make sure to provide the same data type as in training to ensure the same size is mantained. For example, if the trained model was loaded in half precision, also the model_to_merge should be loaded in half precision to enforce comparable size e.g:

model_to_train = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16) 
...
model_to_merge = PeftModel.from_pretrained(AutoModelForCausalLM.from_pretrained(base_model, torch_dtype=torch.bfloat16), lora_adapter)

In the example provided by @dmckinno, the model_to_train is loaded in 8 bits (load_in_8bit=True), and then model_to_merge is loaded in full precision, since no parameter is provided and the default will be used (float32)

nielsr · May 1, 2024, 2:33pm

Hi,

I explain it in more detail : Config.json is not saving after finetuning Llama 2 - #6 by nielsr, hope you find it useful!

And this is a bit related: Further finetuning a LoRA finetuned CausalLM Model - #4 by nielsr

stefutz101 · June 20, 2024, 8:13am

Wasn’t here missing the part where you can push this into HUB?

merged_model.push_to_hub("repo_id")
tokenizer.push_to_hub("repo_id")

P1sc3s007 · February 6, 2025, 7:10am

why my ‘prepare_model_for_int8_training’ is not defined?

Traceback (most recent call last):
  File "/root/autodl-tmp/train_pt.py", line 67, in <module>
    model = prepare_model_for_int8_training(model)
NameError: name 'prepare_model_for_int8_training' is not defined

from transformers import *
from peft import *
import torch
from datasets import load_dataset
import os
from torch.utils.data import DataLoader
from transformers import default_data_collator, get_linear_schedule_with_warmup
from tqdm import tqdm
from datasets import load_dataset
from tensorboard import * 

device = "cuda"
tokenizer_name_or_path = "LLM4Binary/llm4decompile-1.3b-v1.5"
model_name_or_path = "LLM4Binary/llm4decompile-1.3b-v1.5"
dataset_name = "asm2c"
text_column = "asm text"
label_column = "text_label"
max_length = 64
lr = 3e-2
num_epochs = 50
batch_size = 8

from datasets import load_dataset

dataset = load_dataset("json", data_files="./traindata.jsonl")
dataset = dataset["train"].train_test_split(0.2)


tokenizer = AutoTokenizer.from_pretrained("LLM4Binary/llm4decompile-1.3b-v1.5")

def preprocess_function(examples):
    inputs = examples["input"]
    outputs = examples["output"]

    # 合并input和output列
    merged_texts = [f"{input} {output_text}" for input, output_text in zip(inputs, outputs)]
    
    model_inputs = tokenizer(merged_texts, truncation=True, padding="max_length", max_length=512)
    model_inputs["labels"] = model_inputs["input_ids"].copy()  # 设置labels
    return model_inputs

processed_datasets = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=1,
    remove_columns=dataset["train"].column_names,
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)

train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["test"]

peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    num_virtual_tokens=8,
    prompt_tuning_init_text="What's the souce code of this asm?",
    tokenizer_name_or_path=model_name_or_path,
)
checkpoint_name = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace(
    "/", "_"
)

# creating model
model = AutoModelForCausalLM.from_pretrained("LLM4Binary/llm4decompile-1.3b-v1.5", load_in_8bit=True, torch_dtype=torch.float16, device_map="auto")
model = prepare_model_for_int8_training(model)
peft_model = get_peft_model(model, peft_config)


training_args = TrainingArguments(
    output_dir="./results3",             # 保存模型的目录
    evaluation_strategy="epoch",         # 每个 epoch 进行评估
    save_strategy="epoch",               # 每个 epoch 结束时保存模型              
    learning_rate=2e-5,
    per_device_train_batch_size=4,      # 训练时的batch_size
    per_device_eval_batch_size=8,      # 验证时的batch_size
    logging_steps=10,                    # log 打印的频率
    num_train_epochs=3,
    weight_decay=0.01,
    load_best_model_at_end=False
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    #data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True)
)

trainer.train()
'''
trainer.evaluate(eval_dataset)

# 训练结束后手动保存模型
trainer.save_model(output_dir="./tuned_model")  # 保存最终的模型到指定的目录
tokenizer.save_pretrained(save_directory="./tuned_tokenizer")  # 保存tokenizer
'''
lora_adapter = "./lora_adapter"
peft_model.save_pretrained(lora_adapter, save_adapter=True, save_config=True)

model_to_merge = PeftModel.from_pretrained(AutoModelForCausalLM.from_pretrained(model_name_or_path).to("cuda"), lora_adapter)

merged_model = model_to_merge.merge_and_unload()
merged_model.save_pretrained(merged_model)

Topic		Replies	Views
Cannot Merge Lora weights back to the base model Intermediate	8	415	October 29, 2024
Merging in successive loras to a base model Beginners	2	999	October 25, 2024
Further finetuning a LoRA finetuned CausalLM Model 🤗Transformers	17	11198	July 7, 2024
SFTTrainer Merge LoRA weights back into base model? Models	0	1773	December 24, 2023
I wonder how to merge my PEFT adapter with the base model and finally get a new whole model? 🤗Transformers	27	1285	February 7, 2025

Help with merging LoRA weights back into base model :-)

Related topics