Hello,
I had a question when going through the implementation of BERT for question answering.
In L1876 in the implementation of BertForQuestionAnswering
the loss is calculated.
The variable ignored_index
should be the sequence length, as it is used to clamp the start_positions
and end_positions
in the previous lines.
I was wondering why the ignore_index
is set to ignored_index
(==the sequence length) instead of leaving it at the default (-100
)?