Can Q&A model say "I don't know"

Hi, I am working on training a domain-specific ‘extractive Q&A’ model but I was wondering if the model could say “I dont know” when the answer is not in the context. Any comments?

I think SQuAD 2.0 introduced this kind of “unanswerable” questions:

Demo on the model hub that uses a fine-tuned model:

https://huggingface.co/deepset/bert-base-cased-squad2?text=Where+lives+Wolfgang%3F&context=My+name+is+Wolfgang+and+I+live+in+Berlin

Outputs “Berlin” correctly. Now an unanswerable question (because it is not given in the context):

https://huggingface.co/deepset/bert-base-cased-squad2?text=Where+was+Obama+born%3F&context=My+name+is+Wolfgang+and+I+live+in+Berlin

Thanks @stefan-it ! I knew SQuAD 2.0 had unaswerable questions but I was not sure if there was more research conducted beyond that. (I’ll re-read the squad paper)

Thank you for the example links, I tested few more questions and it is hit or miss. For example, when I ask “Where does Sinatra live”, the answer is “Berlin” which is incorrect.

https://huggingface.co/deepset/bert-base-cased-squad2?text=Where+does+Sinatra+live%3F&context=My+name+is+Wolfgang+and+I+live+in+Berlin

Not sure if this is the right approach but may be train a classfier which take the context and question and classify whether the question is answerable or not.

Thank you @valhalla . I’ll take a look at that too.

Although not exactly similar, there are works on dialogue response where “None of the above” can be predicted.

This work talks about the same and could be used in defining the problem statement maybe like the context and the possible answers(including none of the above) as retrieved options.

Not completely sure if this is a right way to target the given problem.

I have no experience with Q&A models but I would expect you get probabilities for the output values. One could add a post-processing step where you return “I am not certain” if the probability is too low. However, this is not the best approach because it might well be that there are many valid options with, leading to a low highest probability.

Yes, setting a heuristic is indeed a way to tackle this. But it could turn out to be a little tricky cause

  1. How would we decide a threshold and how will we ensure that we have a data driven way to calculate it say for a new dataset or a slightly different output setting?
  2. As you know, NN even learn patterns when random labels are passed. So it could turn out that despite the model wanting to tell “I don’t know”, the results will be skewed towards 1 class or the other

These are some issues I guess might occur, but yes as a quick fix and for something which has decent coverage this will work.