I trained a tokenizer and a language model from scratch using a vocabulary size of 400,000.
Now, when I use my trained model for MLM and see the
top_k=400000, some of the words in my
vocab.txt are not there?
Like, for example, in my
vocab.txt, the word “rain” is in there, but when I use my trained model for MLM, the word “rain” is not in the top 400,000 - why is this?
Has it got to do with the MLM probability of 0.15?
I am a bit confused.