Handling Peft Model the right way (save, load, inference)

LidorPrototype · August 10, 2024, 9:34am

I have fine tuned the Falcon-7B-Instruct model using the peft library for Lora/QLoRA.

The questions are as follows:

I’m not sure if I saved it correctly, I would I appreciate help on it
when I’m using save_pretrained after using lora_model.merge_and_unload() and then trying to load it, I get safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization error
After 1 and 2 will be resolved, I’m trying to integrate it to LangChain but I get errors on it too

Saving the model is done like so:

model = AutoModelForCausalLM.from_pretrained(
    hf_model_name, quantization_config=bnb_config, device_map={"": 0}, trust_remote_code=False
)
lora_model = PeftModel.from_pretrained(model, model_final_path, local_files_only=True)
tok = AutoTokenizer.from_pretrained(hf_model_name, trust_remote_code=False)
tok.pad_token = tok.eos_token
lora_model.merge_and_unload()
lora_model.save_pretrained(after_merge_model_path, save_adapter=True, save_config=True)
tok.save_pretrained(after_merge_model_path)

Loading it is done by:

local_model = AutoModelForCausalLM.from_pretrained(after_merge_model_path)
local_tokenizer = AutoTokenizer.from_pretrained(after_merge_model_path)

The future idea is to use this model on my streamlit application with langchain, so I’ll probably need to hold the entire model someplace maybe too

Topic		Replies	Views
I wonder how to merge my PEFT adapter with the base model and finally get a new whole model? 🤗Transformers	27	1005	February 7, 2025
Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference 🤗Transformers	2	1325	August 2, 2024
Inference, checkpoint Beginners	0	874	December 5, 2023
How to properly load the PEFT LoRA model 🤗Transformers	4	7087	April 13, 2025
Saving Fine-tune Falcon Model Beginners	0	37	July 15, 2024

Handling Peft Model the right way (save, load, inference)

Related topics