How to properly load the PEFT LoRA model

arvis · August 22, 2023, 8:38am

I used PEFT LoRA + Trainer to fine-tune a model.

I encountered an issue where the predictions of the fine-tuned model after training and the predictions after loading the model again are different.

I’d like to inquire about how to save the model in a way that allows consistent prediction results when the model is loaded.

Here’s my code. Thank you for your assistance.

# this code is load model and predict testset

config = PeftConfig.from_pretrained('./deberta_adapter')
model = AutoModelForMultipleChoice.from_pretrained(config.base_model_name_or_path, return_dict=True)
model = PeftModel.from_pretrained(model, './deberta_adapter', device_map="auto")

trainer = Trainer(
    model=model,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer)
)

test_predictions = trainer.predict(tokenized_test_dataset).predictions
test_predictions[:4]

array([[-0.2088623 , -0.2109375 , -0.2088623 , -0.21069336, -0.20812988],
       [-0.20654297, -0.20532227, -0.20178223, -0.20324707, -0.20385742],
       [-0.20751953, -0.20983887, -0.20947266, -0.21105957, -0.20874023],
       [-0.21350098, -0.21508789, -0.21435547, -0.21533203, -0.2154541 ]],
      dtype=float32)

# this code just predict testset when model trainer train done
trainer = Trainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
    train_dataset=train_tokenized_dataset,
    eval_dataset=test_tokenized_dataset,
    compute_metrics=compute_metrics,
)

trainer.train()

test_predictions = trainer.predict(tokenized_test_dataset).predictions
test_predictions[:4]

array([[ 0.51416016,  0.9223633 ,  0.91748047,  1.0332031 ,  0.10601807],
       [ 0.58447266,  0.70703125,  0.7138672 ,  0.8330078 ,  0.5136719 ],
       [ 1.2412109 ,  0.80078125,  1.28125   ,  0.15576172,  0.9501953 ],
       [-0.12115479, -0.7988281 , -0.75097656, -1.1914062 , -1.4853516 ]],
      dtype=float32)

additional, opreate as belows, I can get approximate prediction, but I can’t find any load peft model use this method is internet

torch.save(trainer.model.state_dict(),"model.pt")
model = AutoModelForMultipleChoice.from_pretrained(model_name)
peft_config = LoraConfig(
    r=16, lora_alpha=32, task_type=TaskType.SEQ_CLS, lora_dropout=0.1, 
   inference_mode=False
)
model = get_peft_model(model, peft_config)
model.load_state_dict(torch.load("model.pt"))

trainer = Trainer(
    model=model,
    data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer)

)
test_predictions = trainer.predict(tokenized_test_dataset).predictions
test_predictions[:4]

array([[ 0.51416016,  0.9223633 ,  0.9169922 ,  1.0332031 ,  0.10644531],
       [ 0.58447266,  0.70703125,  0.7138672 ,  0.83251953,  0.5131836 ],
       [ 1.2412109 ,  0.80078125,  1.28125   ,  0.15673828,  0.94970703],
       [-0.12109375, -0.7988281 , -0.7504883 , -1.1914062 , -1.4853516 ]],
      dtype=float32)

z13546674246 · August 17, 2024, 5:16pm

Hello, have you figured out how to solve this? Thanks!

nielsr · August 19, 2024, 12:42pm

It’s recommended to use the save_pretrained and from_pretrained methods rather than torch.save and torch.load.

Refer to this guide which showcases fine-tuning + inference: LoRA methods

huggingzob · October 7, 2024, 4:01pm

I am also confused about this. I tried two different approaches but I am not sure if they are both correct. Comparing the tensors after loading the models gives some difference btw them. Any suggestions?


base_path = "/model/base"
adapter_path = "/model/adapter"

# The first method: This loads if the PEFT library is installed
model1 = AutoModelForCausalLM.from_pretrained(adapter_path, torch_dtype=torch.bfloat16, device_map="cuda")

# The second method
model2 = PeftModel.from_pretrained(base_path, adapter_path, torch_dtype=torch.bfloat16, device_map="cuda")

JuyiLin · April 13, 2025, 5:08am

I also meet problem

    vla_lora = PeftModel.from_pretrained(base_vla, model_id=cfg.pretrained_checkpoint, subfolder="lora_adapter") # problemtic, 
    vla_lora = vla_lora.merge_and_unload() #   cannot load training parameter? as no attribute 'print_trainable_parameters'

the below is different , why?

    # from peft import get_peft_model, LoraConfig, TaskType
    # lora_rank = 32
    # lora_config = LoraConfig(
    #             r=lora_rank,
    #             # lora_alpha=min(cfg.lora_rank, 16),
    #             lora_alpha=16,  # Xuan: usually, lora_alpha = 2 * lora_rank
    #             lora_dropout=0.0,
    #             target_modules="all-linear",
    #             init_lora_weights="gaussian",
    #         )
    # vla_lora = get_peft_model(base_vla, lora_config)
    # vla_lora.load_adapter(adapter_dir, adapter_name="default") #

Topic		Replies	Views
Missing trainable parameters in a loaded LoRA model 🤗Transformers	1	1288	July 6, 2023
Loading Lora models after trainning Beginners	1	3247	June 24, 2024
Handling Peft Model the right way (save, load, inference) 🤗Transformers	0	128	August 10, 2024
Load_adapter vs from_pretrained Beginners	1	743	March 20, 2024
Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference 🤗Transformers	2	1316	August 2, 2024

How to properly load the PEFT LoRA model

Related topics