Hi,
A paper showed that you can treat multi-span question answering as token classification. They got really nice results with it: https://aclanthology.org/2020.emnlp-main.248.pdf
So basically you can treat a multi-span QA problem as a sequence labeling problem, hence you can use BertForTokenClassification
.