I am following the multilabel text classification tutorial from @nielsr located here: Transformers-Tutorials/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub
I currently have my dataset split into a train, test, and validation dataset. After training, trainer.evaluate() is called which I think is being done on the validation dataset. My question is how do I use the model I created to predict the labels on my test dataset? Do I just call trainer.predict() immediately after trainer.evaluate() like so?
trainer = Trainer(
model,
args,
train_dataset=encoded_dataset[“train”],
eval_dataset=encoded_dataset[“validation”],
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.evaluate()
trainer.predict(encoded_dataset[“test”])
Or can I just skip trainer.evaluate() and immediately go to trainer.predict() like so?
trainer = Trainer(
model,
args,
train_dataset=encoded_dataset[“train”],
eval_dataset=encoded_dataset[“validation”],
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.predict(encoded_dataset[“test”])
Any help would be greatly appreciated. Thank you!