Unable to use custom dataset when training a tokenizer

Okay, thanks for that. I have trained my own tokenizer from scratch, so how do I use it in the masked language task?