Q&A evaluation: Mismatch in the number of predictions (775) and references (835)

I just ran the code provided in transformers/examples/pytorch/question-answering at main · huggingface/transformers · GitHub
using a custom dataset, which however corresponds to the Sqad dataset format, split in train, test, and validation sets, and privately uploaded as a HF Dataset.

The training went well, however evaluation crashed with the following:
File “…/envs/hf-pt/lib/python3.10/site-packages/evaluate/module.py”, line 432, in compute
self.add_batch(**inputs)
File “…/lib/python3.10/site-packages/evaluate/module.py”, line 512, in add_batch
raise ValueError(error_msg) from None
ValueError: Mismatch in the number of predictions (775) and references (835)

I tried running the code with and without the flag --version_2_with_negative which then leads to following stack trace:
File …/utils_qa.py", line 209, in postprocess_qa_predictions
score_diff = null_score - best_non_null_pred[“start_logit”] - best_non_null_pred[“end_logit”]
UnboundLocalError: local variable ‘null_score’ referenced before assignment

Any idea on what could cause this error?

I got a similar problem. Have you solved it?

I have settled it. In my case, there are some duplicate IDs of questions in the validation set. So I recommend you check whether the id of your dataset is unique.

i am trying to summarize the articles in the cnn daily mail dataset
it is still giving me…
ValueError: Mismatch in the number of predictions (145) and references (218)

1 Like

I got a similar problem when I computed the WER in my program. I found that the input values to evaluation function need to have same shape in the first dimension. So I turned the input values into list type, like this:
wer_metric.compute(predictions=a, references=b) >>> wer_metric.compute(predictions=[a], references=[b])
It works on me.