BertForMaskedLM train

yun · December 8, 2020, 9:10am

I have a question
When training using BertForMaskedLM, is the train data as below correct?

token2idx

<pad> : 0, <mask>: 1, <cls>:2, <sep>:3

max len : 8
input token

 <cls> hello i <mask> cats <sep>

input ids

 [2, 34,45,1,56,3,0,0]

attention_mask

 [1,1,1,1,1,1,0,0]

labels

 [-100,-100,-100,64,-100,-100,-100,-100]

I wonder if I should also assign -100 to labels for padding token.

ayalaall · January 19, 2021, 10:38am

Hi,
Were you able to figure it out? I’m also trying to do the same thing.

Thanks,
Ayala

valhalla · January 20, 2021, 8:05am

you should replace all tokens (including paddding) in labels with -100 except the masked tokens so the loss will only be calculated for masked tokens.

Topic		Replies	Views
BertForMaskedLM training from scratch 🤗Transformers	0	1044	April 7, 2023
Batched BertForMaskedLM inference loss issue Intermediate	0	690	February 23, 2022
BertForMaskedLM’s loss and scores, how the loss is computed? 🤗Transformers	13	25008	September 22, 2023
Is the attention mask and tokenization taken into account? Beginners	0	351	December 7, 2021
Where in the code does masking of tokens happen when pretraining BERT Beginners	5	7268	August 17, 2020

BertForMaskedLM train

Related topics