ValueError when using `run_qa.py` to evaluate model

dattatreya303 · November 26, 2022, 4:26am

Context:

I have fine-tuned my model on squad_v2 dataset using the run_qa.py script. Now, I am trying to use the same script to only evaluate the fine-tuned model on dattatreya303/covid-qa-tts dataset. This is a dataset created by me, adapted from covid_qa_deepset by introducing train/test/val splits. But using it in the following command gives a ValueError (as shown below).

Questions

I do not understand the error message “Predictions and/or references don’t match the expected format.” The expected format matches the examples shown in the error message. AFAIK the data is in the correct squad format. Have I missed anything in the loading script of the dattatreya303/covid-qa-tts dataset card?
Or am I missing an argument in the run_qa.py command?
Any help is appreciated!

Command:

!python run_qa.py \
  --model_name_or_path ./ft-roberta-squadv2/checkpoint-31500/ \
  --dataset_name dattatreya303/covid-qa-tts \
  --do_eval \
  --per_device_eval_batch_size 12 \
  --learning_rate 1e-5 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --version_2_with_negative \
  --output_dir ./ft-roberta-squadv2-eval-cqa/

Stacktrace:

Traceback (most recent call last):
  File "run_qa.py", line 684, in <module>
    main()
  File "run_qa.py", line 641, in main
    metrics = trainer.evaluate()
  File "/content/transformers/examples/pytorch/question-answering/trainer_qa.py", line 58, in evaluate
    metrics = self.compute_metrics(eval_preds)
  File "run_qa.py", line 603, in compute_metrics
    return metric.compute(predictions=p.predictions, references=p.label_ids)
  File "/usr/local/lib/python3.7/dist-packages/evaluate/module.py", line 432, in compute
    self.add_batch(**inputs)
  File "/usr/local/lib/python3.7/dist-packages/evaluate/module.py", line 512, in add_batch
    raise ValueError(error_msg) from None
ValueError: Predictions and/or references don't match the expected format.
Expected format: {'predictions': {'id': Value(dtype='string', id=None), 'prediction_text': Value(dtype='string', id=None), 'no_answer_probability': Value(dtype='float32', id=None)}, 'references': {'id': Value(dtype='string', id=None), 'answers': Sequence(feature={'text': Value(dtype='string', id=None), 'answer_start': Value(dtype='int32', id=None)}, length=-1, id=None)}},
Input predictions: [{'id': 283, 'prediction_text': 'Betacoronavirus', 'no_answer_probability': 0.0}, {'id': 431, 'prediction_text': 'double-stranded', 'no_answer_probability': 0.0}, {'id': 4187, 'prediction_text': 'lapses in infection prevention and control (IPC) in healthcare settings', 'no_answer_probability': 0.0}, ..., {'id': 2771, 'prediction_text': 'the expected number of secondary infections', 'no_answer_probability': 0.0}, {'id': 3254, 'prediction_text': 'Persistent high fever, dyspnea and rapid progression to respiratory failure within 2 weeks', 'no_answer_probability': 0.0}, {'id': 3628, 'prediction_text': 'to identify published studies in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines', 'no_answer_probability': 0.0}],
Input references: [{'id': 283, 'answers': {'text': ['Betacoronavirus'], 'answer_start': [1723]}}, {'id': 431, 'answers': {'text': ['double-stranded ribonucleic acid'], 'answer_start': [4111]}}, {'id': 4187, 'answers': {'text': ['to lapses in infection prevention and control (IPC) in healthcare settings'], 'answer_start': [1932]}}, ..., {'id': 2771, 'answers': {'text': ['2.7-3.4 or 2-4 in Hong Kong'], 'answer_start': [15034]}}, {'id': 3254, 'answers': {'text': ['Persistent high fever, dyspnea and rapid progression to respiratory failure within 2 weeks, together with bilateral consolidations and infiltrates at the same time, are the most frequent clinical manifestations'], 'answer_start': [13068]}}, {'id': 3628, 'answers': {'text': ['to identify published studies examining the diagnosis, therapeutic drugs and vaccines for Severe Acute Respiratory Syndrome (SARS), Middle East Respiratory Syndrome (MERS) and the 2019 novel coronavirus (2019-nCoV), in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.'], 'answer_start': [4552]}}]

akash418 · December 10, 2022, 6:17am

The format expects “id” to be of type string, however the passed input predictions and references include “id” of type int. Casting the feature “id” to string on the original dataset and running the same script should solve the issue.

Topic		Replies	Views
Custom SQuAD2.0 dataset gives an error when using run_qa.py script 🤗Transformers	3	3422	July 30, 2021
How to evaluate models Beginners	0	2847	June 16, 2021
Unbelievable Error: Help ME! 🤗Transformers	6	894	December 18, 2024
ValueError: Predictions and/or references don't match the expected format Beginners	3	4471	October 4, 2023
KeyError when training with a dictionary as a dataset. What should the dataset look like? Beginners	0	706	October 19, 2022

ValueError when using `run_qa.py` to evaluate model

Context:

Questions

Command:

Stacktrace:

Related topics