Hi @nielsr , I’m seeing difference in the model prediction before saving the model and after loading the model.
I’m fine-tuning google’s gemma 2b model,
Please find the reproducible code below,
Here i’m fine-tuning the gemma model with my dataset.
import pandas as pd
import torch
import transformers
from trl import SFTTrainer
from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from datasets import Dataset, load_dataset
model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0}, token=os.environ['HF_TOKEN'])
data = pd.read_csv('train_data.csv')
train_df = Dataset.from_pandas(data)
lora_config = LoraConfig(
r=8,
target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
task_type="CAUSAL_LM",
)
trainer = SFTTrainer(
model=model,
train_dataset=train_df,
dataset_text_field = "text",
max_seq_length = 512,
args=transformers.TrainingArguments(
num_train_epochs = 10,
per_device_train_batch_size=4,
gradient_accumulation_steps=16,
warmup_steps=2,
max_steps=10,
learning_rate=2e-4,
fp16=True,
seed = 12,
logging_steps=1,
output_dir="outputs",
optim="paged_adamw_8bit"
),
peft_config=lora_config,
)
trainer.train()
After finishing training, I immediately tested with two examples to check how model does prediction for it, and I noticed model generated the expected output.
# Below is with example 1 input
text = "trained input example text 1"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Below is with example 2 input
text = "trained input example text 2"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
After checking the performance of the fine-tuned model, I saved the model with the below step
trainer.save_model("finetuned_model")
After saving the model, i restarted the kernel and loaded the fine-tuned model
from peft import AutoPeftModelForCausalLM
import torch
new_finetuned_model = AutoPeftModelForCausalLM.from_pretrained(
"finetuned_model",
low_cpu_mem_usage=True,
return_dict = True,
torch_dtype = torch.float16,
device_map = "cuda:0",)
After loading the fine-tuned model, I tested the model with the same example input and I noticed generated output from the model is different.
text = "trained input example text 1"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = new_finetuned_model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
I’m not sure where I’m making mistake could you please help me here?
Expected behavior
Expecting the model to generate the same answer before saving the model and after loading the model.