How do I evaluate a pretrained model on a test dataset?

Without training, how do I use a test dataset to evaluate a pretrained model?

Hello :slight_smile:

Assuming you’re using PyTorch, you can wrap your model inside a Trainer and then call trainer.evaluate(). An example (taken from here):

from transformers import TrainingArguments

training_args = TrainingArguments("test_trainer"),


import numpy as np
from datasets import load_metric

metric = load_metric("accuracy")


def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

trainer.evaluate()

If you don’t want to use a Trainer, check out the examples here and check the files ending with _no_trainer.py.