Q&A evaluation: Mismatch in the number of predictions (775) and references (835)

dried · June 23, 2023, 8:21am

I just ran the code provided in transformers/examples/pytorch/question-answering at main · huggingface/transformers · GitHub
using a custom dataset, which however corresponds to the Sqad dataset format, split in train, test, and validation sets, and privately uploaded as a HF Dataset.

The training went well, however evaluation crashed with the following:
File “…/envs/hf-pt/lib/python3.10/site-packages/evaluate/module.py”, line 432, in compute
self.add_batch(**inputs)
File “…/lib/python3.10/site-packages/evaluate/module.py”, line 512, in add_batch
raise ValueError(error_msg) from None
ValueError: Mismatch in the number of predictions (775) and references (835)

I tried running the code with and without the flag --version_2_with_negative which then leads to following stack trace:
File …/utils_qa.py", line 209, in postprocess_qa_predictions
score_diff = null_score - best_non_null_pred[“start_logit”] - best_non_null_pred[“end_logit”]
UnboundLocalError: local variable ‘null_score’ referenced before assignment

Any idea on what could cause this error?

Vermouth · July 10, 2023, 2:31am

I got a similar problem. Have you solved it?

Vermouth · July 11, 2023, 5:40pm

I have settled it. In my case, there are some duplicate IDs of questions in the validation set. So I recommend you check whether the id of your dataset is unique.

agntgalahad · August 9, 2023, 3:05pm

i am trying to summarize the articles in the cnn daily mail dataset
it is still giving me…
ValueError: Mismatch in the number of predictions (145) and references (218)

ccdss · September 28, 2023, 5:25pm

I got a similar problem when I computed the WER in my program. I found that the input values to evaluation function need to have same shape in the first dimension. So I turned the input values into list type, like this:
wer_metric.compute(predictions=a, references=b) >>> wer_metric.compute(predictions=[a], references=[b])
It works on me.

Topic		Replies	Views
ValueError when using `run_qa.py` to evaluate model Beginners	1	1515	December 10, 2022
Increasing eval batch size in trainer api causes size mismatch during evaluation 🤗Transformers	0	494	December 24, 2022
Batch_size, seq_length = input_shape ValueError: too many values to unpack (expected 2) Transformer Sentence Similarity Classification 🤗Transformers	16	1110	March 8, 2024
ValueError: Mixed precision training with AMP or APEX (`--fp16`) and FP16 evaluation can only be used on CUDA devices 🤗Transformers	9	23323	April 24, 2024
Compute_metrics() behaves strangely in distributed setting 🤗Transformers	0	45	July 28, 2024

Q&A evaluation: Mismatch in the number of predictions (775) and references (835)

Related topics