I am experimenting with various models in my NLP research and understand that one of the advantages of RoBERTa over BERT is that the former implements dynamic masking. My question relates to where this distinct masking is implemented? It seems to me that all masking takes place in data_collator.py. Is this a correct assumption? I cannot see any distinction made here between RoBERTa and BERT, so I fear the Hugging Face implementation will not get this advantage of dynamic masking.