The BertForQuestionAnswering
architecture is perfect for single span question answering, i.e., extracting a single span (as the answer) from (i) a context & (ii) a question. It outputs two tensors: start_logits
& end_logits
.
But what if the "train"
split of the dataset provides multiple spans as the answer? (In this case, the objective is to predict multiple spans as the answer for a single example.)
What architecture works for this problem? Do we have to create something custom, or does HF Transformers provide a class (similar to BertForQuestionAnswering
)?