Why are my special tokens not appearing as predictions?


I trained a masked language model on a Twitter dataset, with each tweet containing one emoji. Then, I used the following code to add the emojis as special tokens:

num_added_toks = tokenizer.add_tokens(['πŸ˜ƒ',
print('We have added', num_added_toks, 'tokens')
model.resize_token_embeddings(len(tokenizer))  # Notice: resize_token_embeddings expect to receive the full size of the new vocabulary, i.e. the length of the tokenizer

From adding the special tokens, I added 3311 different emojis successfully, which increased the embedding to (53575, 768) as shown below:

We have added 3311 tokens

Embedding(53575, 768)

Now, here’s the issue I am facing… When I add the <mask> token to a sentence and input the top_k as the total number of embeddings, which is 53575, not a single emoji shows up in the predictions.

I used this line of code:

mask_filler("Are you happy today <mask>", top_k=53575)

As you can see in the code above, the top_k is 53575, the total number of embeddings which should include the 3311 emojis I added, right?

However, when I make the predictions and scroll through the list of 53575, not a single emoji is there!

I am so confused to why this is happening! Like, I have added the emojis to the vocabulary, but they are simple not there when making predictions.

Can someone help me please?