The docs state that the masked language modeling objective is simply
input_ids = tokenizer.encode('The <extra_id_0> walks in <extra_id_1> park', return_tensors='pt')
labels = tokenizer.encode('<extra_id_0> cute dog <extra_id_1> the <extra_id_2> </s>', return_tensors='pt')
model(input_ids=input_ids, labels=labels)
I was wondering if I need to manually set the additional_special_tokens_ids
(corresponding to the <extra_id_#>
sentinels) in the labels
to -100
during training so that they are ignored by the loss? It seems that at least the pad_token_id
is changed in examples/seq2seq
, but it’s not clear if this is true for the sentinels as well.