What's the inner mechanism of Masked Language Model in BERT

charles0120 · March 31, 2021, 11:49am

Hi guys. I just finished reading the original BERT paper, but get confused about the masked language model. 80% of the selected words are masked, 10% are maintained and 10% are replaced with a random word. What happens to the learning process of the model under the three conditions respectively?
Besides, in my opinion, if the masked word is not masked, the model can still learn something from the sentence, why there is a label leakage problem? Are there any straightforward explanations for the label leakage problem?

Topic		Replies	Views
Is masking still used when finetuning a BERT model? Beginners	1	1322	July 29, 2020
BERT MLM - 80% [MASK], 10% random words and 10% same word - how does this work? 🤗Transformers	0	1188	May 12, 2022
How does BERT only compute the softmax for the masked hidden vectors? Models	0	481	January 6, 2023
Learning rate for further pretraining BERT on masked language modeling task 🤗Transformers	0	205	September 16, 2021
MLM vs CLM, can be exchanged? Models	0	1053	August 21, 2022

What's the inner mechanism of Masked Language Model in BERT

Related topics