How to filter predicted tokens in masked language modelling?

anon58275033 · July 23, 2021, 4:15pm

Hello,

I have trained a masked language model using my own dataset, which contains sentences with emojis (trained on 20,000 entries).

Now, when I make predictions, I want emojis to be in the output, however, most of the predicted tokens are words, so I think that the emojis are right at the bottom of the list somewhere, as they must be less frequent compared to the words.

So far, this is my output - you can see that one emoji has been predicted, but the rest of the predictions are words:

mask_filler("I am so good today <mask>", top_k=5)

[{'score': 0.2953376770019531,
  'sequence': 'I am so good today."',
  'token': 72,
  'token_str': '."'},
 {'score': 0.18523386120796204,
  'sequence': 'I am so good today 🙂',
  'token': 328,
  'token_str': '🙂'},
 {'score': 0.1431082785129547,
  'sequence': 'I am so good today!"',
  'token': 2901,
  'token_str': '!"'},
 {'score': 0.13269349932670593,
  'sequence': 'I am so good today.',
  'token': 4,
  'token_str': '.'},
 {'score': 0.030341114848852158,
  'sequence': 'I am so good today :)',
  'token': 44660,
  'token_str': ' :)'},

Therefore, I was wondering if there is any code or functions that can filter the predictions, so that there are only emojis in the output.

I have got 1 emoji to show in the output, but I think the rest of the emojis are less frequent tokens, so they are not appearing at the top when I make predictions.

So, is it possible to filter it to make emojis appear and cancel out the words?

Thanks.

Topic		Replies	Views
Is it possible to filter the predicted tokens in masked language modelling? Beginners	0	240	July 26, 2021
Is it possible to see all the token rankings for masked language modelling? Beginners	1	314	June 17, 2022
[HELP] How to include emojis in masked language modelling? Beginners	0	861	June 8, 2021
[HELP] Special tokens not appearing as predicted tokens! Beginners	14	909	August 4, 2021
Why does my MLM model still not output emojis after adding them as special tokens? Beginners	0	422	June 29, 2021

How to filter predicted tokens in masked language modelling?

Related topics