Save, load and do inference with fine-tuned model

Iamexperimenting · March 6, 2024, 11:28pm

Hi team,

I’m using huggingface framework to fine-tune LLMs. Currently, I’m using mistral model. I wanted to save the fine-tuned model and load it later and do inference with it.

Since, I’m new to Huggingface framework I would like to get your guidance on saving, loading, and inferencing.

I remember in PyTorch we need to use with torch.no_grad(): context manager to do inference. But, I’m not seeing such thing in HuggingFace.

@nielsr Could you please guide me here?


import pandas as pd
import torch
from datasets import Dataset, load_dataset
from random import randrange
from peft import LoraConfig, get_peft_model, AutoPeftModelForCausalLM, prepare_model_for_int8_training
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
from trl import SFTTrainer
import warnings
warnings.filterwarnings("ignore")

df = pd.read_csv("train.csv")
train = Dataset.from_pandas(df)
model_id = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True,device_map="auto")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model = AutoModelForCausalLM.from_pretrained(model_id,
                                             load_in_4bit = True,
                                             torch_dtype = torch.float16,
                                             device_map="auto")
model.resize_token_embeddings(len(tokenizer))
model = prepare_model_for_int8_training(model)
peft_config = LoraConfig(
                          lora_alpha=16,
                          lora_dropout=0.1,
                          r=64,
                          bias="none",
                          task_type="CAUSAL_LM"
                        )
model = get_peft_model(model, peft_config)

args = TrainingArguments(
    output_dir='custom_domain',
    num_train_epochs=2, # adjust based on the data size
    per_device_train_batch_size=8, # use 4 if you have more GPU RAM
    optim = "adamw_torch",
    logging_steps = 100,
    save_total_limit = 2,
    save_strategy = "no",
    load_best_model_at_end=False,
    learning_rate=2e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16 = torch.cuda.is_bf16_supported(),
    evaluation_strategy="epoch",
    seed=42,
    warmup_ratio = 0.1,
    lr_scheduler_type = "linear",
    report_to="none",
    torch_compile = True
    #dataloader_num_workers = 4
)

# Create the trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=train,
    # eval_dataset=test,
    dataset_text_field='text',
    peft_config=peft_config,
    max_seq_length=512,
    tokenizer=tokenizer,
    args=args,
    packing=False,
)

trainer.train()

I couldn’t find any complete example in the HuggingFace page. Can you please provide me relevant example?

Additionally, I noticed mistral model is not able to compile using HuggingFace framework during fine-tuning.

CKeibel · March 7, 2024, 8:40am

For inference, causal models have the generate() function which is wrapped in torch.no_grad.

tokenized_prompt = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**tokenized_prompt)
tokenizer.decode(outputs[0])

PEFT models requires explicit positional arguments (input_ids)

input_ids= tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids=input_ids)
tokenizer.decode(outputs[0])

You can load a model with the from_pretained function. Instead of the huggingface model_id, enter the path to your saved model.

model = AutoModelForCausalLM.from_pretrained("path/to/model.pt")

Saving works via the save_pretrained() function.

model.save_pretrained("path/to/model.pt")

Since you have trained the model with PEFT, you can also only save and load the adapter. Here is a good guide on how to do this.

EDIT:
You specified a output_dir in your TrainingArguments which should help you saving your model. (Sorry, I never actually use the trainer and fine-tune in plain pytorch.)

Iamexperimenting · March 7, 2024, 2:13pm

I’m seeing different methods to save the fine-tuned model. That confuses me.

Example1 : model.save_pretrained('./output/')
Example1 : trainer.save_model('./output/')
Example1 : trainer.model.save_pretrained('./output/')

and some example with merge and unload.

@nielsr can you provide some example for fine-tuned model?

nielsr · March 8, 2024, 6:13pm

Hi,

@CKeibel explained it well. If you’re using the Trainer API, you can specify an output_dir to which it will automatically save the model. You can specify the saving frequency in the TrainingArguments (like every epoch, every x steps, etc.).

Afterwards, you can load the model using the from_pretrained method, by specifying the path to the folder.

See this demo notebook which showcases fine-tuning Mistral-7B which includes an inference section.

Topic		Replies	Views
Prakash Hinduja Switzerland (Swiss) How do I load a pre-trained model in Hugging Face? Beginners	1	33	June 26, 2025
How do I load a fine tuned model? 🤗Transformers	0	763	April 11, 2021
Saving underlying language model after trained on downstream task 🤗Transformers	0	426	September 14, 2020
How to save and load fine-tune model 🤗Transformers	4	24772	October 25, 2021
How do I load an SFTTrainer model finetuned falcon-7b-sharded-bf16 using custom dataset, and make prediction with it Beginners	2	1288	August 1, 2023

Save, load and do inference with fine-tuned model

Related topics