Download and load fine-tuned model locally (VS Code)

DioulaD · July 14, 2023, 1:47pm

Hi everyone,
Need some help to debug my code. I have a fine-tuned model. I have tested it in Colab and it works perfectly. Here is the code I use to load and run the model.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

def load_peft_model():
    peft_model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"    
    model = AutoModelForCausalLM.from_pretrained(
            "tiiuae/falcon-7b-instruct",
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
        )
    model = PeftModel.from_pretrained(model, peft_model_id)
    model = model.merge_and_unload()

    config = PeftConfig.from_pretrained(peft_model_id)

    tknizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
    tknizer.pad_token = tknizer.eos_token
    return model, tknizer

def get_expectations(prompt, model, tknizer):
  """
  Convert natural language query to great expectation methods using finetuned falcon 7b
  Params:
    prompt : Natural language query
    model : Model download from huggingface hub
    tknizer = Tokenizer from peft model
  """
  
  encoding = tknizer(prompt, return_tensors="pt").to("cuda:0")
  
  with torch.inference_mode():
    out = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        max_new_tokens=100, do_sample=True, temperature=0.3,
        eos_token_id=tknizer.eos_token_id,
        top_k=0
    )

  response = tknizer.decode(out[0], skip_special_tokens=True)
  return response.split("\n")[1]

The above code runs on Google Colab without any issue. I am trying to run it on VSCode but facing some errors.

ValueError: The current 'device_map'  had weights offloaded to the disk. Please provide an 'offload_folder' for them. Alternatively, make sure you have 'safe tensors' installed if the model you are using offers weights in this format

I added the offload_folder in:

model = AutoModelForCausalLM.from_pretrained(
           "tiiuae/falcon-7b-instruct",
           torch_dtype=torch.bfloat16,
           device_map="auto",
           trust_remote_code=True,
          offload_folder = "offload_folder"
       )

I got another different error for offload_dir.

ValueError: We need an offload_dir to dispatch this model according to this 'device_map', the following submodules need to be offloaded: base_model.model.transformer.h.24, ........

I tried added it in model = PeftModel.from_pretrained(model, peft_model_id) but still getting the error.

Has someone come across these errors ?

A part from that, I want to ask if there was a way to avoid dowloaded the model every time I run load_peft_model function. For instance download on only once (for the first run) and when re run, use the downloaded model. For instance something like this

def load_peft_model():
    peft_model_id = "DioulaD/falcon-7b-instruct-qlora-ge-dq-v2"
    model_folder = "model_folder"  # Folder to store the downloaded model

    if not os.path.exists(model_folder):
        os.makedirs(model_folder)

    model_path = os.path.join(model_folder, "model")
    tokenizer_path = os.path.join(model_folder, "tokenizer")

    if os.path.exists(model_path) and os.path.exists(tokenizer_path):
        # Load the model and tokenizer from the stored files
        model = PeftModel.from_pretrained(model_path)
        tknizer = AutoTokenizer.from_pretrained(tokenizer_path)
    else:
        # Download and save the model and tokenizer
        model = AutoModelForCausalLM.from_pretrained(
            "tiiuae/falcon-7b-instruct",
            torch_dtype=torch.bfloat16,
            device_map="auto",
            trust_remote_code=True,
        )
        model = PeftModel.from_pretrained(model, peft_model_id)
        model = model.merge_and_unload()

        config = PeftConfig.from_pretrained(peft_model_id)

        tknizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
        tknizer.pad_token = tknizer.eos_token

        # Save the model and tokenizer to the specified folder
        model.save_pretrained(model_path)
        tknizer.save_pretrained(tokenizer_path)

    return model, tknizer

I have tried something like this but does work.
Thanks a lot.

AanVar · December 8, 2023, 5:41am

Hi @DioulaD,
I came across this difficulty recently. Have been browsing to find a solution.

I tried in a different manner on collab(pro plus). I used the base model initially like everyone does. Finetuned it and then saved the model on collab along with its tokenizer. I zipped it and downloaded the file so that I can use it on my local PC. The issue I am facing is: This finetuned and downloaded model is not working properly, It keeps on giving me errors. I am not sure why this happens.

Error:

ValueError: Non-consecutive added token ‘’ found. Should have index 32000 but has index 0 in saved vocabulary.

I tried to find solutions on Stackoverflow, Github and Huggingface. I was not able to understand the issue.

Then I thought it must be my low GPU on my local system. I tried to run the same code with this downloaded model on collab. Collab as well gave me the same issue.

@DioulaD I found your solution logical. I will try your logic once. Thank you for this.

Any idea how to resolve this issue?
Does anyone else have any other idea to run a huggingface model on local PC?(I have tried HF API solution)
Which are the important files of a model when we want to run a model on local PC?

Note: The model that I am using in my code is Llama 7B.

nebi · January 24, 2025, 7:54am

Hi,
Just wanted to write that even one year after this question, the issue is still relevant and unresolved. When googling for the answer, there are many issues that pop-up on stackoverflow, huggingface and so on. But nowhere there is a solution. There was one code-snippet that can be found and tried, but which also does not resolve the issue (at least in my case):

John6666 · January 24, 2025, 10:06am

The first code worked as is in my local environment. Of course, my VRAM is insufficient.
Errors may occur if there is insufficient disk space.

model, tokenizer = load_peft_model()

github.com/lm-sys/FastChat

[error] ValueError：The current `device_map` had weights offloaded to the disk. Please provide an `offload_folder` for them. Alternatively, make sure you have `safetensors` installed if the model you are using offers the weights in this format.

opened 02:59AM - 29 Jul 23 UTC

cm-liushaodong

when i use commend 'python -m fastchat.serve.cli --model-path /home/models/Llama…-2-70b-chat --num-gpus 2', the following error occurred： <img width="1621" alt="image" src="https://github.com/lm-sys/FastChat/assets/44772254/264fafa8-5b27-49c7-b74d-717a6df777a5"> Llama-2-70b-chat is converted through transformers' src/transformers/models/llama/convert_llama_weights_to_hf.py Supplementary note: Llama-2-70b-chat is obtained through the conversion of src/transformers/models/llama/convert_llama_weights_to_hf.py of transformers. This is because I am on https://github.com/facebookresearch/llama.git, using download.sh downloaded.

Topic		Replies	Views
Unable to Load Fine-Tuned Florence-2 Model Checkpoint from Colab on Local Device Models	2	152	January 18, 2025
Unable to load fine-tuned llm Beginners	4	3268	January 31, 2024
How to load the finetuned model (merged weights) on colab? 🤗Transformers	1	1491	November 27, 2023
How do I load an SFTTrainer model finetuned falcon-7b-sharded-bf16 using custom dataset, and make prediction with it Beginners	2	1272	August 1, 2023
Retraining peft model Intermediate	3	2926	March 1, 2024

Download and load fine-tuned model locally (VS Code)

Related topics