Hello,
I have fine tuned BERT on QA with SQuAd_v2 and was able to achieve the following results:
{'exact': 75.76715597879051,
'f1': 86.10779795627346,
'total': 130319,
'HasAns_exact': 66.280047453957,
'HasAns_f1': 81.80143193308851,
'HasAns_total': 86821,
'NoAns_exact': 94.70320474504575,
'NoAns_f1': 94.70320474504575,
'NoAns_total': 43498,
'best_exact': 75.77176006568497,
'best_exact_thresh': 0.0,
'best_f1': 86.11240204317286,
'best_f1_thresh': 0.0}
As we can see, the model was able to obtain 94.7% F1-score on NoAns questions, however, when I use this model for inference with the pipeline API, it never predicts a NoAnswer question correctly, (always outpus some random answer even if there is no possible correct answer in the given context).
I think the pipeline api is not prepared to handle noAnswer questions (could be wrong).
How can I use my model for inference in a way it is able to predict that the question has not answer given a certain context? Thank you in advance