Does all masking during training take place in data_collator.py?

jpharper · November 11, 2023, 7:14pm

I am experimenting with various models in my NLP research and understand that one of the advantages of RoBERTa over BERT is that the former implements dynamic masking. My question relates to where this distinct masking is implemented? It seems to me that all masking takes place in data_collator.py. Is this a correct assumption? I cannot see any distinction made here between RoBERTa and BERT, so I fear the Hugging Face implementation will not get this advantage of dynamic masking.

Topic		Replies	Views
Is the huggingface run_mlm Script dynamically masked? 🤗Transformers	8	1650	June 1, 2022
Difference between roberta and bert for pretraining Models	0	559	July 15, 2023
Using a dataset with already masked tokens Beginners	2	702	February 3, 2021
Sequence masking 🤗Transformers	0	379	April 25, 2022
Will masking more tokens speed up training and use less memory in HuggingFace's Bert or Roberta? 🤗Transformers	1	291	December 3, 2022

Does all masking during training take place in data_collator.py?

Related topics