Fine tunning QA model in SQUAD 2 dataset with more than one answer

Hello.

I had been able to fine tune successfully some models for QA in custom SQUAD dataset using scripts like run_qa.py and run_seq2sqe_qa.py from https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/

But this scripts are made only for SQUAD datasets in which for every question, only one possible answer is available, because in the code you get this:

For run_qa.py:
start_char = answers[“answer_start”][0]

For run_seq2seq_qa.py:
targets = [answer[“text”][0] if len(answer[“text”]) > 0 else “” for answer in answers]

In my SQUAD dataset for some questions I could have several answers. How should I deal with this situation?

I have tried to concatenate the answers using some kind of special chars in order to convert several answers into one, as for example: “answer1 ##AND## answer2”. But this not run properly, never returning that kind of concatenated answers.

I’ve also tried converting the SQUAD dataset to a single answer, but in that case, for a question in the same context, the dataset contains several different disjoint answers, and I find this approach weird.

So how do you go about training with a custom SQUAD 2 dataset with multiple answers for the same question?

Thanks in advance!

2 Likes

did you ever figure out the solution for multiple answers?

did you solve the issue?