Can't load my model from pipeline

Hello,

I am new to Hugging Face and to model fine-tuning, I am trying to do a project for college and I found this How to Fine-Tune LLMs in 2024 with Hugging Face, I followed but using my dataset (it’s public on my profile), I successfully fine-tuned the model(in collab using A100) but when I’m doing(on my PC):

from transformers import pipeline

generator = pipeline("text-generation", model = "stefutz101/code-llama-7b-databases-finetuned")

res = generator("how to select from database?", max_length=256)

print(res)

It’s not working:

C:\Users\stef_\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Loading checkpoint shards: 0%|

Also, I had some problems with torch, I had to install version 2.2.2 because the last version could not find shm.dll, not sure if this helps. This is the link to my model: stefutz101/code-llama-7b-databases-finetuned · Hugging Face

I really don’t know what’s the problem, any hints would help, thank you!

What actually happens? This looks like a generic warning (not an error) if you leave it does it continue?

Hello, two things happened:

  • I was running running out of space in Local disk C. So I made some make space.
  • I saw that I am missing config.json file in HUB. So after hours of reading I found out that I have to Merge LOra adapter with the base model. I already did part first time too, but when I looked inside the generated folder from Colab, there were some more files than I had in HUB, so in following code section I added the last two lines of code:
### COMMENT IN TO MERGE PEFT AND BASE MODEL ####
from peft import AutoPeftModelForCausalLM

# Load PEFT model on CPU
model = AutoPeftModelForCausalLM.from_pretrained(
    args.output_dir,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
)
# Merge LoRA and base model and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained(args.output_dir,safe_serialization=True, max_shard_size="2GB")
# save model
merged_model.push_to_hub("repo_id")
tokenizer.push_to_hub("repo_id")

Now I am able to call it this way:

from transformers import pipeline

generator = pipeline("text-generation", model = "repo_id")

res = generator("INSERT_QUESTION_HERE")
print(res)

Also, @swtb no, after a while nothing happened and then it was stopping.

I hope this helps someone.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.