Creating masked sentences


I am retraining Roberta and want to evaluate the MLM accuracy to check the difference before retraining and after retraining. I know that we can use a data collator during retraining for generating masked input.

But for evaulating, if I have a selected set of 100 sentences, how can I generate a masked version of these sentences?

For example, if the sentence was: This movie was very interesting to watch
I want to have a sentence that looks like this: This movie was very ###### to watch.

How to do this?

Thanks in advance!

Hi ! You can use the data collator for evaluation as well