Unable to load fine-tuned llm

DioulaD · July 3, 2023, 9:12am

Hello everyone,
I have fine tuned Falcon 7B large language model on Google Colab with qlora approach and pushed the model on the hub using model.push_to_hub() after training. I have tried to load the model but got the following error “DioulaD/falcon-7b-instruct-qlora-ge-dq-v2 does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack”. Can someone help the beginner I am to debug this .

Here is the code I am using to load the model:

model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tk = AutoTokenizer.from_pretrained(model_id)
tk.pad_token = tk.eos_token
m = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, trust_remote_code=True)

nielsr · July 3, 2023, 9:55am

Hi,

Looking at the “files and versions” tab here: DioulaD/falcon-7b-instruct-qlora-ge-dq-v2 at main it indeed seems that you don’t have the pre-trained weights of the model included there (which would be in a file called pytorch_model.bin in case you’re using PyTorch). The repository only seems to include the adapter weights (in a file called adapter_model.bin). Hence to load the full model, we need to do the following:

from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch

model = AutoModelForCausalLM.from_pretrained(
    "tiiuae/falcon-7b",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2")
model = model.merge_and_unload()

We first load the pre-trained Falcon-7b model from the appropriate repo on the hub, then load the adapter weights from your repo, and then merge them into a single model.

nielsr · July 3, 2023, 10:00am

See also this notebook for more info: finetune_falcon7b_oasst1_with_bnb_peft.ipynb · dfurman/falcon-7b-chat-oasst1-peft at main.

DioulaD · July 3, 2023, 10:16am

Hello @nielsr,
It now works perfectly. Thanks a lot

Jaumeme · January 31, 2024, 2:41pm

One question connected with the code you provided @nielsr. If the adapter was achieved by quantization, shouldn´t we apply also quantization when loading the pre-trained model before merging them?

Thanks in advance.

Topic		Replies	Views
How to load a model fine-tuned with QLoRA 🤗Transformers	2	6661	July 29, 2024
Peft model from pretrained load in 8/4 bit 🤗Transformers	6	17561	October 12, 2023
Resolving "Cannot Perform Fine-Tuning on Purely Quantized Models" Error in Falcon Model Training? 🤗Transformers	4	9077	May 9, 2025
How to load after calling trainer.model.push_to_hub() on a fine tuned model? 🤗Transformers	1	901	October 9, 2023
How do I load an SFTTrainer model finetuned falcon-7b-sharded-bf16 using custom dataset, and make prediction with it Beginners	2	1274	August 1, 2023

Unable to load fine-tuned llm

Related topics