Save, load and do inference with fine-tuned model

CKeibel · March 7, 2024, 8:40am

For inference, causal models have the generate() function which is wrapped in torch.no_grad.

tokenized_prompt = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**tokenized_prompt)
tokenizer.decode(outputs[0])

PEFT models requires explicit positional arguments (input_ids)

input_ids= tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids=input_ids)
tokenizer.decode(outputs[0])

You can load a model with the from_pretained function. Instead of the huggingface model_id, enter the path to your saved model.

model = AutoModelForCausalLM.from_pretrained("path/to/model.pt")

Saving works via the save_pretrained() function.

model.save_pretrained("path/to/model.pt")

Since you have trained the model with PEFT, you can also only save and load the adapter. Here is a good guide on how to do this.

EDIT:
You specified a output_dir in your TrainingArguments which should help you saving your model. (Sorry, I never actually use the trainer and fine-tune in plain pytorch.)

Topic		Replies	Views
Prakash Hinduja Switzerland (Swiss) How do I load a pre-trained model in Hugging Face? Beginners	1	23	June 26, 2025
How to save and load fine-tune model 🤗Transformers	4	24703	October 25, 2021
How do I load an SFTTrainer model finetuned falcon-7b-sharded-bf16 using custom dataset, and make prediction with it Beginners	2	1272	August 1, 2023
Hugging Face Trainer class with accelerate 🤗Accelerate	2	389	May 21, 2024
Hugging Face to GGUF Conversion Broken? 🤗Hub	1	5266	February 11, 2024

Save, load and do inference with fine-tuned model

Related topics