The docs state that the masked language modeling objective is simply
input_ids = tokenizer.encode('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt') labels = tokenizer.encode('<extra_id_0> cute dog <extra_id_1> the <extra_id_2> </s>', return_tensors='pt') model(input_ids=input_ids, labels=labels)
I was wondering if I need to manually set the
additional_special_tokens_ids (corresponding to the
<extra_id_#> sentinels) in the
-100 during training so that they are ignored by the loss? It seems that at least the
pad_token_id is changed in
examples/seq2seq, but it’s not clear if this is true for the sentinels as well.