How to improve F1 score in SQAUD2 Question Answering Task on Distilbert Pretarined Model

bhadresh-savani · February 25, 2021, 8:35am

While using Colab with the inference code written I am getting the below results.

{
'exact': 31.272635391223783, 
'f1': 35.63616173418905,
 'total': 11873, 
'HasAns_exact': 59.83468286099865, 
'HasAns_f1': 68.57424903340527,
 'HasAns_total': 5928,
 'NoAns_exact': 2.7922624053826746, 
'NoAns_f1': 2.7922624053826746,
 'NoAns_total': 5945,
 'best_exact': 50.07159100480081, 
'best_exact_thresh': 0.0, 
'best_f1': 50.07159100480081,
 'best_f1_thresh': 0.0}

When we use Huggingface script for the evaluation script below we get better results. What things should I change in the colab code to move the EM and F1?

python run_qa.py \
--model_name_or_path /path/to/distilbert-squad2 \
--dataset_name squad_v2 \
--version_2_with_negative \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ./tmp

Eval Results:
02/25/2021 07:13:08 - INFO - __main__ -   ***** Eval results *****
02/25/2021 07:13:08 - INFO - __main__ -     HasAns_exact = 71.54183535762483
02/25/2021 07:13:08 - INFO - __main__ -     HasAns_f1 = 78.03088635740741
02/25/2021 07:13:08 - INFO - __main__ -     HasAns_total = 5928
02/25/2021 07:13:08 - INFO - __main__ -     NoAns_exact = 72.22876366694702
02/25/2021 07:13:08 - INFO - __main__ -     NoAns_f1 = 72.22876366694702
02/25/2021 07:13:08 - INFO - __main__ -     NoAns_total = 5945
02/25/2021 07:13:08 - INFO - __main__ -     best_exact = 71.88579129116482
02/25/2021 07:13:08 - INFO - __main__ -     best_exact_thresh = 0.0
02/25/2021 07:13:08 - INFO - __main__ -     best_f1 = 75.12567121424334
02/25/2021 07:13:08 - INFO - __main__ -     best_f1_thresh = 0.0
02/25/2021 07:13:08 - INFO - __main__ -     exact = 71.88579129116482
02/25/2021 07:13:08 - INFO - __main__ -     f1 = 75.12567121424338
02/25/2021 07:13:08 - INFO - __main__ -     total = 11873

How to make this colab evaluation code generalized for other transformer-based question answering models?

lewtun · February 25, 2021, 9:50am

Hi @bhadresh-savani, there’s a lot of tricky pre- and post-processing needed to get the question-answering working. For example, I think your implementation is missing the sliding window needed to chunk long documents into passages and the sorting of the predicted answers in the evaluation.

Sylvain Gugger has a nice Colab tutorial with all these details here, so my suggestion would be to compare his implementation against yours to see what you need to add.

Topic		Replies	Views
How to reproduce the performance of bert-large-uncased-whole-word-masking-finetuned-squad? Intermediate	0	303	July 25, 2021
Different accuracy values 🤗AutoTrain	0	21	October 12, 2024
Different accuracy Beginners	0	152	August 17, 2023
How to get accuracy of pre trained model in huggingface? Beginners	3	4208	September 20, 2022
Question Answering Prediction without answear Intermediate	0	368	December 31, 2022

How to improve F1 score in SQAUD2 Question Answering Task on Distilbert Pretarined Model

Related topics