Does BERT Use Two Segments of a Sequence When Predicting the Masks?

miract · June 14, 2022, 8:23pm

I am trying to pre-train a BERT model with different approaches. I am trying to find out what happens when there is a sequence of two sentences. For example, let’s say we are using a dual-task pre-training with MLM and NSP. There is an input sequence as:

[CLS] the man [MASK] to the store [SEP] penguin [MASK] are flight ##less birds [SEP]

So when predicting the first masked token in the first part of the sequence, does the BERT model take into consideration the second part of the sequence? If it is how? Is it by a control mechanism over token_type_ids specifically for the MLM task? If it isn’t then doesn’t it care about the segments and take all the sequence into consideration?

Topic		Replies	Views
Where in the code does masking of tokens happen when pretraining BERT Beginners	5	7326	August 17, 2020
Fill-mask and classification at the same time Beginners	4	820	March 18, 2022
For BERT LMs ... are the random tasks created on just the first sentence or the second as well? 🤗Transformers	1	252	July 11, 2021
Questions on the `BertModelLMHeadModel` 🤗Transformers	7	6310	October 5, 2020
In BertForMaskedLM, how to return as output the predicted embedding? Beginners	0	487	February 4, 2021

Does BERT Use Two Segments of a Sequence When Predicting the Masks?

Related topics