Trainer.evaluate() vs trainer.predict()

curtmiller22 · December 19, 2022, 5:24pm

I am following the multilabel text classification tutorial from @nielsr located here: Transformers-Tutorials/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub

I currently have my dataset split into a train, test, and validation dataset. After training, trainer.evaluate() is called which I think is being done on the validation dataset. My question is how do I use the model I created to predict the labels on my test dataset? Do I just call trainer.predict() immediately after trainer.evaluate() like so?

trainer = Trainer(
model,
args,
train_dataset=encoded_dataset[“train”],
eval_dataset=encoded_dataset[“validation”],
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.evaluate()
trainer.predict(encoded_dataset[“test”])

Or can I just skip trainer.evaluate() and immediately go to trainer.predict() like so?

trainer = Trainer(
model,
args,
train_dataset=encoded_dataset[“train”],
eval_dataset=encoded_dataset[“validation”],
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.predict(encoded_dataset[“test”])

Any help would be greatly appreciated. Thank you!

nielsr · December 20, 2022, 3:52pm

It depends on what you’d like to do, trainer.evaluate() will predict + compute metrics on your test set and trainer.predict() will only predict labels on your test set. However in case the test set also contains ground-truth labels, the latter will also compute metrics.

curtmiller22 · December 21, 2022, 10:31pm

Thanks for getting back to me. Maybe my question is more related to what’s happening in inside trainer.train() and the difference between validation and prediction.

After every training epoch (at least the way it is set up in the tutorial notebook), isn’t the model being evaluated against the validation dataset? So why is trainer.evaluate() being run on the validation dataset? Wouldn’t you want it to be the test dataset?

Alegzandra · February 28, 2023, 2:16pm

Hi! I have encountered the same problem, when running the same notebook… Did you manage to find the answer?..

Alegzandra · March 15, 2023, 3:16pm

I solved the problem by specifying the test dataset as evaluation parameter: trainer.evaluate(eval_dataset=encoded_dataset[“test”])

Fardan · February 24, 2024, 5:54pm

You can set your test dataset as eval dataset on the fly post training completion.
trainer.eval_dataset =
then run
trainer.evaluate()

taehyunzzz · July 10, 2024, 5:49am

I might have a clue for generative tasks. I’ve had the same problem for summarization tasks, but it seems that generation length for evaluation mode is longer than that of validation, which answers why the two accuracies had been different. You should check the evaluation kwargs to see the differneces!

Topic		Replies	Views
Is Eval and Validation same in Trainer API? Beginners	4	1736	September 14, 2021
What is the difference between Trainer.evaluate() and Trainer.predict()? 🤗Transformers	1	4185	July 22, 2022
How to use the test set in those beginner examples? Beginners	1	702	October 18, 2021
How do I evaluate a pretrained model on a test dataset? Beginners	1	8721	February 24, 2022
Logging training accuracy using Trainer class 🤗Transformers	8	10455	December 2, 2021

Trainer.evaluate() vs trainer.predict()

Related topics