Why is the CrossEntropyLoss ignore_index set to sequence length in BertForQuestionAnswering

tobigue · May 13, 2022, 10:39am

Hello,

I had a question when going through the implementation of BERT for question answering.

In L1876 in the implementation of BertForQuestionAnswering the loss is calculated.

The variable ignored_index should be the sequence length, as it is used to clamp the start_positions and end_positions in the previous lines.

I was wondering why the ignore_index is set to ignored_index (==the sequence length) instead of leaving it at the default (-100)?

Topic		Replies	Views
"IndexError: index out of range in self" in BertForPreTraining Beginners	0	1035	January 31, 2022
Predicted Start_index < Predicted End_index in BertForQuestionAnswering 🤗Transformers	1	355	September 1, 2021
Question about BERT for qa Beginners	0	593	June 30, 2022
SQuAD/BERT: Why max_length=384 by default and not 512? Models	1	2464	November 15, 2021
Sentence pair classification with BertForSequenceClassification cause IndexError: index out of range in self 🤗Transformers	0	1548	November 10, 2022