Unable to load fine-tuned llm

Hello everyone,
I have fine tuned Falcon 7B large language model on Google Colab with qlora approach and pushed the model on the hub using model.push_to_hub() after training. I have tried to load the model but got the following error “DioulaD/falcon-7b-instruct-qlora-ge-dq-v2 does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack”. Can someone help the beginner I am to debug this :slight_smile: .

Here is the code I am using to load the model:

model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tk = AutoTokenizer.from_pretrained(model_id)
tk.pad_token = tk.eos_token
m = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, trust_remote_code=True)

Hi,

Looking at the “files and versions” tab here: DioulaD/falcon-7b-instruct-qlora-ge-dq-v2 at main it indeed seems that you don’t have the pre-trained weights of the model included there (which would be in a file called pytorch_model.bin in case you’re using PyTorch). The repository only seems to include the adapter weights (in a file called adapter_model.bin). Hence to load the full model, we need to do the following:

from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch

model = AutoModelForCausalLM.from_pretrained(
    "tiiuae/falcon-7b",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2")
model = model.merge_and_unload()

We first load the pre-trained Falcon-7b model from the appropriate repo on the hub, then load the adapter weights from your repo, and then merge them into a single model.

1 Like

See also this notebook for more info: finetune_falcon7b_oasst1_with_bnb_peft.ipynb · dfurman/falcon-7b-chat-oasst1-peft at main.

Hello @nielsr,
It now works perfectly. Thanks a lot :slight_smile:

One question connected with the code you provided @nielsr. If the adapter was achieved by quantization, shouldn´t we apply also quantization when loading the pre-trained model before merging them?

Thanks in advance.