Run_squad occasionally finds an answer to a question asked of a text fragment

mfeb · September 9, 2020, 6:12pm

We have built a causation-focused question answering capability, based on SQuAD2 on ALBERT v2 xxlarge, using transformers run_squad script. It performs very well on most corpus files when asked

"What causes X?"

and

"What does X cause?"

We have noticed on a few occasions that when a question is asked of a very short paragraph (really, a snippet) we get an answer, with a high score.

For example, to the question of

What does increasing demand cause?

We get the following answer (post-processed):

"cause": "Increasing demand",
"effect": "Changes in consumption patterns",
"score": 0.9729956935388504,
"context": "Impacts of increasing demand:",

Key here is that the “paragraph” is simply “Impacts of increasing demand:”

We could (and should) be filtering out these short phrases, but had the full expectation that the question answering would do that for us.

More surprisingly, of the ones we see (since we’re looking at the top k) the answers seem high quality and they seem to be sensible response.

So the question is, where are these coming from? Is there something about the language model and its exposure to a huge corpus that leads it to fill in the blank, without justification in the text (a trick question on a reading comprehension test? ) but at the same time, to give a sensible answer?

We’re flummoxed, and we’d like to know whether/how we might be able to control it.

BTW - in this case there is somewhat of a “causal signal” - the work “Impacts.” In a longer paragraph I would expect “the Impacts of X are Y and Z” to find an answer, e.g. X causes Y, or X causes Y and Z, or some such.

Topic		Replies	Views
Question about BERT for qa Beginners	0	594	June 30, 2022
Unit of max_answer_length in run_qa.py script? 🤗Transformers	1	529	February 4, 2022
Chapter 1 questions Course	107	24430	May 28, 2025
What's the difference between a QA model trained with SQuAD1.0 and SQuAD2.0? 🤗Transformers	2	905	July 15, 2020
[Question Answering] Why SQuaD training set only contrains one possible answer in each sample 🤗Datasets	0	550	October 14, 2022

Run_squad occasionally finds an answer to a question asked of a text fragment

Related topics