Handling Peft Model the right way (save, load, inference)

I have fine tuned the Falcon-7B-Instruct model using the peft library for Lora/QLoRA.

The questions are as follows:

  1. I’m not sure if I saved it correctly, I would I appreciate help on it
  2. when I’m using save_pretrained after using lora_model.merge_and_unload() and then trying to load it, I get safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization error
  3. After 1 and 2 will be resolved, I’m trying to integrate it to LangChain but I get errors on it too

Saving the model is done like so:

model = AutoModelForCausalLM.from_pretrained(
    hf_model_name, quantization_config=bnb_config, device_map={"": 0}, trust_remote_code=False
)
lora_model = PeftModel.from_pretrained(model, model_final_path, local_files_only=True)
tok = AutoTokenizer.from_pretrained(hf_model_name, trust_remote_code=False)
tok.pad_token = tok.eos_token
lora_model.merge_and_unload()
lora_model.save_pretrained(after_merge_model_path, save_adapter=True, save_config=True)
tok.save_pretrained(after_merge_model_path)

Loading it is done by:

local_model = AutoModelForCausalLM.from_pretrained(after_merge_model_path)
local_tokenizer = AutoTokenizer.from_pretrained(after_merge_model_path)


The future idea is to use this model on my streamlit application with langchain, so I’ll probably need to hold the entire model someplace maybe too