Changing the mlm_probability
argument wont give you the result you need,
but I think you can create a sub class of DataCollatorForLanguageModeling
that does the emoji masking.
You can find the source code for DataCollatorForLanguageModeling
here.