Evaluating huggingface transformer with trainer gives different results

JanKoci · March 22, 2023, 3:58pm

I am using a pre-trained Transformer for sequence classification (distilbert-base-cased) which I fine-tuned on my dataset with the Trainer class. When I evaluate the model using the Trainer class I get an accuracy of 94%

trainer = Trainer(model=model)
preds = trainer.predict(validation_dataset)
predictions = np.argmax(preds.predictions, axis=-1)

metric = evaluate.load("accuracy")
metric.compute(predictions=predictions, references=preds.label_ids)
# prints: {'accuracy': 0.9435554514341591}

However, when I tried to get the predictions directly from the model, the accuracy was only around 86%:

predictions = []
model.eval()
for row in validation_dataset:
    text_ids = row['input_ids'].unsqueeze(0)
    predicted = torch.argmax(model(text_ids)[0])
    predictions += [predicted.item()]

metric.compute(predictions, labels)
# prints {'accuracy': 0.8639942552151239}

I wonder why are the predictions from the trainer and the model different. And additionally, why is the accuracy of the predictions from the trainer so much better? Am I missing something or is it an indication of bad implementation?

Topic		Replies	Views
Cannot see training accuracy, only validation accuracy 🤗Transformers	2	1284	February 20, 2024
Trainer class optimization for transformer models Models	0	419	January 8, 2022
How to get accuracy of pre trained model in huggingface? Beginners	3	4205	September 20, 2022
Inconsistent evaluation result Beginners	0	22	October 23, 2024
[HELP] Model Evaluation for NER yields different results (sklearn vs metric.compute()) 🤗Transformers	3	2719	January 31, 2023

Evaluating huggingface transformer with trainer gives different results

Related topics