Unlabeled entries in evaluation subset?

jpodivin · May 1, 2024, 6:13pm

The tutorial for prompt based tuning Prompt-based methods uses twitter_complaints subset of ought/raft. But from what I can see all of the entries in the test split are Unlabeled.

How are the evaluation metrics supposed to work in that case? If the trained model correctly predicts the label, it will be compared with Unlabeled ground truth, so the metrics will become meaningless.

Or is there something happening with the evaluation that I don’t quite get?

Topic		Replies	Views
How to log predictions from evaluation set after each Trainer validation to wandb? Beginners	2	1038	March 14, 2024
T5 Model Evaluation on Generation 🤗Transformers	0	421	February 8, 2024
Trainer.evaluate() vs trainer.predict() 🤗Transformers	6	36320	July 10, 2024
Report metric per sample on evaluate Beginners	0	220	August 4, 2023
[HELP]Bart summarization output exactly the same as labels 🤗Transformers	3	852	August 4, 2021

Unlabeled entries in evaluation subset?

Related topics