Download and load fine-tuned model locally (VS Code)

Hi everyone,
Need some help to debug my code. I have a fine-tuned model. I have tested it in Colab and it works perfectly. Here is the code I use to load and run the model.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

def load_peft_model():
    peft_model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"    
    model = AutoModelForCausalLM.from_pretrained(
            "tiiuae/falcon-7b-instruct",
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
        )
    model = PeftModel.from_pretrained(model, peft_model_id)
    model = model.merge_and_unload()

    config = PeftConfig.from_pretrained(peft_model_id)

    tknizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
    tknizer.pad_token = tknizer.eos_token
    return model, tknizer

def get_expectations(prompt, model, tknizer):
  """
  Convert natural language query to great expectation methods using finetuned falcon 7b
  Params:
    prompt : Natural language query
    model : Model download from huggingface hub
    tknizer = Tokenizer from peft model
  """
  
  encoding = tknizer(prompt, return_tensors="pt").to("cuda:0")
  
  with torch.inference_mode():
    out = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        max_new_tokens=100, do_sample=True, temperature=0.3,
        eos_token_id=tknizer.eos_token_id,
        top_k=0
    )

  response = tknizer.decode(out[0], skip_special_tokens=True)
  return response.split("\n")[1]

The above code runs on Google Colab without any issue. I am trying to run it on VSCode but facing some errors.

ValueError: The current 'device_map'  had weights offloaded to the disk. Please provide an 'offload_folder' for them. Alternatively, make sure you have 'safe tensors' installed if the model you are using offers weights in this format

I added the offload_folder in:

model = AutoModelForCausalLM.from_pretrained(
           "tiiuae/falcon-7b-instruct",
           torch_dtype=torch.bfloat16,
           device_map="auto",
           trust_remote_code=True,
          offload_folder = "offload_folder"
       )

I got another different error for offload_dir.

ValueError: We need an offload_dir to dispatch this model according to this 'device_map', the following submodules need to be offloaded: base_model.model.transformer.h.24, ........

I tried added it in model = PeftModel.from_pretrained(model, peft_model_id) but still getting the error.

Has someone come across these errors ?

A part from that, I want to ask if there was a way to avoid dowloaded the model every time I run load_peft_model function. For instance download on only once (for the first run) and when re run, use the downloaded model. For instance something like this

def load_peft_model():
    peft_model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"
    model_folder = "model_folder"  # Folder to store the downloaded model

    if not os.path.exists(model_folder):
        os.makedirs(model_folder)

    model_path = os.path.join(model_folder, "model")
    tokenizer_path = os.path.join(model_folder, "tokenizer")

    if os.path.exists(model_path) and os.path.exists(tokenizer_path):
        # Load the model and tokenizer from the stored files
        model = PeftModel.from_pretrained(model_path)
        tknizer = AutoTokenizer.from_pretrained(tokenizer_path)
    else:
        # Download and save the model and tokenizer
        model = AutoModelForCausalLM.from_pretrained(
            "tiiuae/falcon-7b-instruct",
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
        )
        model = PeftModel.from_pretrained(model, peft_model_id)
        model = model.merge_and_unload()

        config = PeftConfig.from_pretrained(peft_model_id)

        tknizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
        tknizer.pad_token = tknizer.eos_token

        # Save the model and tokenizer to the specified folder
        model.save_pretrained(model_path)
        tknizer.save_pretrained(tokenizer_path)

    return model, tknizer

I have tried something like this but does work.
Thanks a lot.

Hi @DioulaD,
I came across this difficulty recently. Have been browsing to find a solution.

I tried in a different manner on collab(pro plus). I used the base model initially like everyone does. Finetuned it and then saved the model on collab along with its tokenizer. I zipped it and downloaded the file so that I can use it on my local PC. The issue I am facing is: This finetuned and downloaded model is not working properly, It keeps on giving me errors. I am not sure why this happens.

Error:

ValueError: Non-consecutive added token ‘’ found. Should have index 32000 but has index 0 in saved vocabulary.

I tried to find solutions on Stackoverflow, Github and Huggingface. I was not able to understand the issue.

Then I thought it must be my low GPU on my local system. I tried to run the same code with this downloaded model on collab. Collab as well gave me the same issue.

@DioulaD I found your solution logical. I will try your logic once. Thank you for this.

  1. Any idea how to resolve this issue?
  2. Does anyone else have any other idea to run a huggingface model on local PC?(I have tried HF API solution)
  3. Which are the important files of a model when we want to run a model on local PC?

Note: The model that I am using in my code is Llama 7B.