For inference, causal models have the generate() function which is wrapped in torch.no_grad.
tokenized_prompt = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**tokenized_prompt)
tokenizer.decode(outputs[0])
PEFT models requires explicit positional arguments (input_ids)
input_ids= tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids=input_ids)
tokenizer.decode(outputs[0])
You can load a model with the from_pretained function. Instead of the huggingface model_id, enter the path to your saved model.
model = AutoModelForCausalLM.from_pretrained("path/to/model.pt")
Saving works via the save_pretrained()
function.
model.save_pretrained("path/to/model.pt")
Since you have trained the model with PEFT, you can also only save and load the adapter. Here is a good guide on how to do this.
EDIT:
You specified a output_dir
in your TrainingArguments
which should help you saving your model. (Sorry, I never actually use the trainer and fine-tune in plain pytorch.)