Save, load and do inference with fine-tuned model

For inference, causal models have the generate() function which is wrapped in torch.no_grad.

tokenized_prompt = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**tokenized_prompt)
tokenizer.decode(outputs[0])

PEFT models requires explicit positional arguments (input_ids)

input_ids= tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids=input_ids)
tokenizer.decode(outputs[0])

You can load a model with the from_pretained function. Instead of the huggingface model_id, enter the path to your saved model.

model = AutoModelForCausalLM.from_pretrained("path/to/model.pt")

Saving works via the save_pretrained() function.

model.save_pretrained("path/to/model.pt")

Since you have trained the model with PEFT, you can also only save and load the adapter. Here is a good guide on how to do this.

EDIT:
You specified a output_dir in your TrainingArguments which should help you saving your model. (Sorry, I never actually use the trainer and fine-tune in plain pytorch.)

1 Like