I was reading the RoBERTa paper and seems that DataCollatorForLanguageModeling do not perform the same masked language model masking.
The authors describe the BERT masled text selection:
“BERT uniformly selects 15% of the input tokens for possible replacement. Of the selected tokens, 80% are replaced with [MASK], 10% are left unchanged, and 10% are replaced by a randomly selected vocabulary token.”
Seems that the DataCollatorForLanguageModeling do not perform this replacing 80% with [MASK] 10% left unchanged and 10% randomly selected. It is not mentioned on documentation:
How can I do that in masked language models?
There is a way to use DataCollator to get the exact same text processing that BERT does?
Thanks in advance.