Creating masked sentences

ssam9 · February 27, 2022, 10:10pm

Hi,

I am retraining Roberta and want to evaluate the MLM accuracy to check the difference before retraining and after retraining. I know that we can use a data collator during retraining for generating masked input.

But for evaulating, if I have a selected set of 100 sentences, how can I generate a masked version of these sentences?

For example, if the sentence was: This movie was very interesting to watch
I want to have a sentence that looks like this: This movie was very ###### to watch.

How to do this?

Thanks in advance!

lhoestq · March 2, 2022, 10:44am

Hi ! You can use the data collator for evaluation as well

Topic		Replies	Views
Sequence masking 🤗Transformers	0	379	April 25, 2022
Sequence Length in Continued Pretraining (MLM) & Masking Strategies Intermediate	0	1180	January 6, 2022
How to correctly evaluate a Masked Language Model? 🤗Transformers	3	4388	August 11, 2023
Is the huggingface run_mlm Script dynamically masked? 🤗Transformers	8	1647	June 1, 2022
[URGENT] Issues with Training RoBERTa Model for Text Prediction with Fill Mask Task 🤗Transformers	6	216	March 19, 2024

Creating masked sentences

Related topics