I had a question when going through the implementation of BERT for question answering.
In L1876 in the implementation of
BertForQuestionAnswering the loss is calculated.
ignored_index should be the sequence length, as it is used to clamp the
end_positions in the previous lines.
I was wondering why the
ignore_index is set to
ignored_index (==the sequence length) instead of leaving it at the default (