Hi,
I trained a tokenizer and a language model from scratch using a vocabulary size of 400,000.
Now, when I use my trained model for MLM and see the top_k=400000
, some of the words in my vocab.txt
are not there?
Like, for example, in my vocab.txt
, the word “rain” is in there, but when I use my trained model for MLM, the word “rain” is not in the top 400,000 - why is this?
Has it got to do with the MLM probability of 0.15?
I am a bit confused.