I am using a pre-trained Transformer for sequence classification (distilbert-base-cased) which I fine-tuned on my dataset with the Trainer class. When I evaluate the model using the Trainer class I get an accuracy of 94%
trainer = Trainer(model=model)
preds = trainer.predict(validation_dataset)
predictions = np.argmax(preds.predictions, axis=-1)
metric = evaluate.load("accuracy")
metric.compute(predictions=predictions, references=preds.label_ids)
# prints: {'accuracy': 0.9435554514341591}
However, when I tried to get the predictions directly from the model, the accuracy was only around 86%:
predictions = []
model.eval()
for row in validation_dataset:
text_ids = row['input_ids'].unsqueeze(0)
predicted = torch.argmax(model(text_ids)[0])
predictions += [predicted.item()]
metric.compute(predictions, labels)
# prints {'accuracy': 0.8639942552151239}
I wonder why are the predictions from the trainer and the model different. And additionally, why is the accuracy of the predictions from the trainer so much better? Am I missing something or is it an indication of bad implementation?