Trainer.evaluate() vs trainer.predict()

I am following the multilabel text classification tutorial from @nielsr located here: Transformers-Tutorials/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb at master 路 NielsRogge/Transformers-Tutorials 路 GitHub

I currently have my dataset split into a train, test, and validation dataset. After training, trainer.evaluate() is called which I think is being done on the validation dataset. My question is how do I use the model I created to predict the labels on my test dataset? Do I just call trainer.predict() immediately after trainer.evaluate() like so?

trainer = Trainer(
model,
args,
train_dataset=encoded_dataset[鈥渢rain鈥漖,
eval_dataset=encoded_dataset[鈥渧alidation鈥漖,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.evaluate()
trainer.predict(encoded_dataset[鈥渢est鈥漖)

Or can I just skip trainer.evaluate() and immediately go to trainer.predict() like so?

trainer = Trainer(
model,
args,
train_dataset=encoded_dataset[鈥渢rain鈥漖,
eval_dataset=encoded_dataset[鈥渧alidation鈥漖,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.predict(encoded_dataset[鈥渢est鈥漖)

Any help would be greatly appreciated. Thank you!

It depends on what you鈥檇 like to do, trainer.evaluate() will predict + compute metrics on your test set and trainer.predict() will only predict labels on your test set. However in case the test set also contains ground-truth labels, the latter will also compute metrics.

Thanks for getting back to me. Maybe my question is more related to what鈥檚 happening in inside trainer.train() and the difference between validation and prediction.

After every training epoch (at least the way it is set up in the tutorial notebook), isn鈥檛 the model being evaluated against the validation dataset? So why is trainer.evaluate() being run on the validation dataset? Wouldn鈥檛 you want it to be the test dataset?

Hi! I have encountered the same problem, when running the same notebook鈥 Did you manage to find the answer?..

I solved the problem by specifying the test dataset as evaluation parameter: trainer.evaluate(eval_dataset=encoded_dataset[鈥渢est鈥漖)