Why are my special tokens not appearing as predictions?

Hi,

I trained a masked language model on a Twitter dataset, with each tweet containing one emoji. Then, I used the following code to add the emojis as special tokens:

num_added_toks = tokenizer.add_tokens(['πŸ˜ƒ',
'πŸ˜„',
'😁',
'πŸ˜†',
'πŸ˜…',
'πŸ˜‚',
'🀣',
'πŸ₯²',
'☺️',
'😊',
'πŸ˜‡',
'πŸ™‚',
'πŸ™ƒ',
'πŸ˜‰',
'😌',
'😍',
'πŸ₯°',
'😘',
'πŸ˜—',
'πŸ˜™',
'😚',
'πŸ˜‹',
'πŸ˜›',
'😝',
'😜',
'πŸ€ͺ',
'🀨',
'🧐',
'πŸ€“',
'😎',
'πŸ₯Έ',
'🀩',
'πŸ₯³',
'😏',
'πŸ˜’',
'😞',
'πŸ˜”',
'😟',
'πŸ˜•',
'πŸ™',
'☹️',
'😣',
'πŸ˜–',
'😫',
'😩',
'πŸ₯Ί',
'😒',
'😭',
'😀',
'😠',
'😑',
'🀬',
'🀯',
'😳',
'πŸ₯΅',
'πŸ₯Ά',
'😱',
'😨',
'😰',
'πŸ˜₯',
'πŸ˜“',
'πŸ€—',
'πŸ€”',
'🀭',
'🀫',
'πŸ€₯',
'πŸ§”πŸΏβ€β™‚οΈ'])
print('We have added', num_added_toks, 'tokens')
model.resize_token_embeddings(len(tokenizer))  # Notice: resize_token_embeddings expect to receive the full size of the new vocabulary, i.e. the length of the tokenizer

From adding the special tokens, I added 3311 different emojis successfully, which increased the embedding to (53575, 768) as shown below:

We have added 3311 tokens

Embedding(53575, 768)

Now, here’s the issue I am facing… When I add the <mask> token to a sentence and input the top_k as the total number of embeddings, which is 53575, not a single emoji shows up in the predictions.

I used this line of code:

mask_filler("Are you happy today <mask>", top_k=53575)

As you can see in the code above, the top_k is 53575, the total number of embeddings which should include the 3311 emojis I added, right?

However, when I make the predictions and scroll through the list of 53575, not a single emoji is there!

I am so confused to why this is happening! Like, I have added the emojis to the vocabulary, but they are simple not there when making predictions.

Can someone help me please?

Thanks!