I wanted to ask if it is a common practice to clone the training dataset when pre-training a model using MLM technique. Because of the probabilities, model will probably always mask different tokens and will have a different task, even when two provided sentences are the same.
First iteration of dataset: I ate an apple —masking—> I [MASK] an apple
Second iteration of dataset: I ate an apple --masking–=> I ate [MASK] apple
Thanks is advance for every comment