How can i training a MLM without labels?

I would like to train a masked language model to generate text using reinforcement learning. As the text should not be the same as the masked text but rather according to the reward function, what should be the labels for the model then?
In the case of training a T5, I would require either decoder inputs or labels, but since I do not want to use teacher forcing, but rather feed in the generated tokens in an autoregressive manner. I am unclear about this sort of training.