We want to fine-tuned a QA model, which is based on BertForQuestionAnswering.
After training, we can get a span-start/end scores by input_ids/token_type_ids/attention_mask and choose the indices with maximum span-start/end scores as predicted start_index and predicted end_index .
But, sometimes predicted start_index would less than predicted end_index .
If any reasonable method to solve this situation, thanks~
span-start scores = [-0.1, -2.1, 0.7, 1.3, 4.1]
span-end scores = [-0.7, 3, 5, -0.7, 3.3]
predicted start_index = 4
predicted end_index = 2
It is not reasonable.