I am new to Hugging Face and to model fine-tuning, I am trying to do a project for college and I found this How to Fine-Tune LLMs in 2024 with Hugging Face, I followed but using my dataset (it’s public on my profile), I successfully fine-tuned the model(in collab using A100) but when I’m doing(on my PC):
from transformers import pipeline
generator = pipeline("text-generation", model = "stefutz101/code-llama-7b-databases-finetuned")
res = generator("how to select from database?", max_length=256)
print(res)
It’s not working:
C:\Users\stef_\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Loading checkpoint shards: 0%|
I was running running out of space in Local disk C. So I made some make space.
I saw that I am missing config.json file in HUB. So after hours of reading I found out that I have to Merge LOra adapter with the base model. I already did part first time too, but when I looked inside the generated folder from Colab, there were some more files than I had in HUB, so in following code section I added the last two lines of code:
### COMMENT IN TO MERGE PEFT AND BASE MODEL ####
from peft import AutoPeftModelForCausalLM
# Load PEFT model on CPU
model = AutoPeftModelForCausalLM.from_pretrained(
args.output_dir,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
# Merge LoRA and base model and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained(args.output_dir,safe_serialization=True, max_shard_size="2GB")
# save model
merged_model.push_to_hub("repo_id")
tokenizer.push_to_hub("repo_id")
Now I am able to call it this way:
from transformers import pipeline
generator = pipeline("text-generation", model = "repo_id")
res = generator("INSERT_QUESTION_HERE")
print(res)
Also, @swtb no, after a while nothing happened and then it was stopping.