I had been able to fine tune successfully some models for QA in custom SQUAD dataset using scripts like run_qa.py and run_seq2sqe_qa.py from https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/
But this scripts are made only for SQUAD datasets in which for every question, only one possible answer is available, because in the code you get this:
start_char = answers[“answer_start”]
targets = [answer[“text”] if len(answer[“text”]) > 0 else “” for answer in answers]
In my SQUAD dataset for some questions I could have several answers. How should I deal with this situation?
I have tried to concatenate the answers using some kind of special chars in order to convert several answers into one, as for example: “answer1 ##AND## answer2”. But this not run properly, never returning that kind of concatenated answers.
I’ve also tried converting the SQUAD dataset to a single answer, but in that case, for a question in the same context, the dataset contains several different disjoint answers, and I find this approach weird.
So how do you go about training with a custom SQUAD 2 dataset with multiple answers for the same question?
Thanks in advance!