Why is the CrossEntropyLoss ignore_index set to sequence length in BertForQuestionAnswering

Hello,

I had a question when going through the implementation of BERT for question answering.

In L1876 in the implementation of BertForQuestionAnswering the loss is calculated.

The variable ignored_index should be the sequence length, as it is used to clamp the start_positions and end_positions in the previous lines.

I was wondering why the ignore_index is set to ignored_index (==the sequence length) instead of leaving it at the default (-100)?