When I inference my fine tuned model, why does it behave randomly?

pycornlee · November 29, 2024, 12:14am

I have fine tuned “bert-base-uncased model” for paraphrase, while training i get 99.02%accuracy, but when inferencing, the accuracy is random? I provide the training code and inference code. Plz, help me
###Training Code
import numpy as np
from datasets import load_dataset
from transformers import AutoTokenizer, DataCollatorWithPadding
from transformers import TrainingArguments
from transformers import AutoModelForSequenceClassification
from transformers import Trainer
from peft import LoraConfig, get_peft_model ,TaskType

def print_trainable_parameters(model):
“”"
Prints the number of trainable parameters in the model.
“”"
trainable_params = 0
all_param = 0
for _, param in model.named_parameters():
all_param += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(
f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
)

model = AutoModelForSequenceClassification.from_pretrained(“bert-base-uncased”, num_labels=2)
print_trainable_parameters(model)

config = LoraConfig(
r=32,
lora_alpha=32,
target_modules=[“query”, “value”],
lora_dropout=0.1,
bias=“lora_only”,
modules_to_save=[“decode_head”],
)
lora_model = get_peft_model(model, config)
print_trainable_parameters(lora_model)

raw_datasets = load_dataset(“gokuls/glue_augmented_mrpc”)
checkpoint = “bert-base-uncased”
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

def tokenize_function(example):
return tokenizer(example[“sentence1”], example[“sentence2”], truncation=True)

tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

training_args = TrainingArguments(
“test-trainer-lora”,
eval_strategy=“epoch”,
num_train_epochs=20, # 10 epochs, change to 3 for faster training
learning_rate=5e-6,
per_device_train_batch_size=16,
save_strategy=“no”,
save_steps = 0
)

trainer = Trainer(
lora_model,
training_args,
train_dataset=tokenized_datasets[“train”],
eval_dataset=tokenized_datasets[“validation”],
data_collator=data_collator,
tokenizer=tokenizer,
)

trainer.train()

lora_model.save_pretrained(“./lora_model”)

predictions = trainer.predict(tokenized_datasets[“validation”])

preds = np.argmax(predictions.predictions[1], axis=1)
labels = np.array(raw_datasets[“validation”][“label”])

accuracy = (preds == labels).mean()
print(f"Model accuracy: {accuracy*100:.2f}%")
###Inference Code
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel, PeftConfig
import numpy as np
import torch
from tqdm import tqdm
from datasets import load_dataset

config = PeftConfig.from_pretrained(“./lora_model”)
model = AutoModelForSequenceClassification.from_pretrained(“bert-base-uncased”, num_labels=2)
inference_model = PeftModel.from_pretrained(model, “./lora_model”)

tokenizer = AutoTokenizer.from_pretrained(“bert-base-uncased”)

raw_datasets = load_dataset(“gokuls/glue_augmented_mrpc”)
def is_paraphrase(sentence1, sentence2):
inputs = tokenizer(sentence1, sentence2,return_tensors=“pt”)
with torch.no_grad():
outputs = inference_model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=-1).item()
return predicted_class

if name == “main”:
raw_datasets = raw_datasets[“validation”]
print(raw_datasets.shape)
correct = 0
eval_len = len(raw_datasets)
for i in tqdm(range(eval_len)):
sentence1 = raw_datasets[i][“sentence1”]
sentence2 = raw_datasets[i][“sentence2”]
if is_paraphrase(sentence1, sentence2) == int(raw_datasets[i][“label”]):
correct += 1
print(f"Accuracy := {100*correct/eval_len:.2f}%")

John6666 · November 29, 2024, 8:58am

It seems to become non-deterministic if you forget to call .eval().

github.com/huggingface/transformers

BERT output not deterministic

opened 11:07PM - 17 Jun 19 UTC

closed 07:46AM - 24 Aug 19 UTC

yspaik

wontfix

BERT output is not deterministic. I expect the output values are deterministic …when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, once. That is, once another value comes out, the same value comes out and it repeats. How I can make the output deterministic? let me show snippets of my code. I use the model as below. ``` tokenizer = BertTokenizer.from_pretrained(self.bert_type, do_lower_case=self.do_lower_case, cache_dir=self.bert_cache_path) pretrain_bert = BertModel.from_pretrained(self.bert_type, cache_dir=self.bert_cache_path) bert_config = pretrain_bert.config ``` Get the output like this ``` all_encoder_layer, pooled_output = self.model_bert(all_input_ids, all_segment_ids, all_input_mask) # all_encoder_layer: BERT outputs from all layers. # pooled_output: output of [CLS] vec. ``` pooled_output ``` tensor([[-3.3997e-01, 2.6870e-01, -2.8109e-01, -2.0018e-01, -8.6849e-02, tensor([[ 7.4340e-02, -3.4894e-03, -4.9583e-03, 6.0806e-02, 8.5685e-02, tensor([[-3.3997e-01, 2.6870e-01, -2.8109e-01, -2.0018e-01, -8.6849e-02, tensor([[ 7.4340e-02, -3.4894e-03, -4.9583e-03, 6.0806e-02, 8.5685e-02, ```` for the all encoder layer, the situation is same, - same in twice an once. I extract word embedding feature from the bert, and the situation is same. ``` wemb_n tensor([[[ 0.1623, 0.4293, 0.1031, ..., -0.0434, -0.5156, -1.0220], tensor([[[ 0.0389, 0.5050, 0.1327, ..., 0.3232, 0.2232, -0.5383], tensor([[[ 0.1623, 0.4293, 0.1031, ..., -0.0434, -0.5156, -1.0220], tensor([[[ 0.0389, 0.5050, 0.1327, ..., 0.3232, 0.2232, -0.5383], ```

Topic		Replies	Views
Finetune model outputs diffrent predictions at each run ? why? 🤗Transformers	0	369	December 15, 2021
Inference from a fine-tuned model -- help with interpretation of results Beginners	3	369	January 26, 2024
Why fine-tuning BERT mlm on specific domain doesn't work? What am I doing wrong? 🤗Transformers	2	1426	November 22, 2021
Fine tuning an unsupervised model - BERT Beginners	0	854	April 7, 2022
My fine tuned model behaves differently each run Beginners	1	23	November 29, 2024

When I inference my fine tuned model, why does it behave randomly?

Related topics