Hello,
I have trained a masked language model using my own dataset, which contains sentences with emojis (trained on 20,000 entries).
Now, when I make predictions, I want emojis to be in the output, however, most of the predicted tokens are words, so I think that the emojis are right at the bottom of the list somewhere, as they must be less frequent compared to the words.
So far, this is my output - you can see that one emoji has been predicted, but the rest of the predictions are words:
mask_filler("I am so good today <mask>", top_k=5)
[{'score': 0.2953376770019531,
'sequence': 'I am so good today."',
'token': 72,
'token_str': '."'},
{'score': 0.18523386120796204,
'sequence': 'I am so good today 🙂',
'token': 328,
'token_str': '🙂'},
{'score': 0.1431082785129547,
'sequence': 'I am so good today!"',
'token': 2901,
'token_str': '!"'},
{'score': 0.13269349932670593,
'sequence': 'I am so good today.',
'token': 4,
'token_str': '.'},
{'score': 0.030341114848852158,
'sequence': 'I am so good today :)',
'token': 44660,
'token_str': ' :)'},
Therefore, I was wondering if there is any code or functions that can filter the predictions, so that there are only emojis in the output.
I have got 1 emoji to show in the output, but I think the rest of the emojis are less frequent tokens, so they are not appearing at the top when I make predictions.
So, is it possible to filter it to make emojis appear and cancel out the words?
Thanks.