Is it possible to filter the predicted tokens in masked language modelling?

anon58275033 · July 26, 2021, 4:14pm

I have trained a masked language model using my own dataset, which contains sentences with emojis (trained on 20,000 entries).

Now, when I make predictions, I want emojis to be in the output, however, most of the predicted tokens are words, so I think that the emojis are right at the bottom of the list somewhere, as they must be less frequent tokens compared to the words.

So far, this is my output - you can see that one emoji has been predicted, but the rest of the predictions are words:

mask_filler("I am so good today, <mask>", top_k=5)

[{'score': 0.2953376770019531,
  'sequence': 'I am so good today, friend',
  'token': 72,
  'token_str': 'friend'},
 {'score': 0.18523386120796204,
  'sequence': 'I am so good today 🙂',
  'token': 328,
  'token_str': '🙂'},
 {'score': 0.1431082785129547,
  'sequence': 'I am so good today, mate',
  'token': 2901,
  'token_str': 'mate'},
 {'score': 0.13269349932670593,
  'sequence': 'I am so good today, father',
  'token': 4,
  'token_str': 'father'},
 {'score': 0.030341114848852158,
  'sequence': 'I am so good today, mother',
  'token': 44660,
  'token_str': 'mother'},

Therefore, I was wondering if there is any code or functions that can filter the predictions, so that there are only emojis in the output, removing any predicted tokens that are words.

I have got one emoji to show in the output, but I think the rest of the emojis are less frequent tokens, so they are not appearing at the top when I make predictions.

So, is it possible to filter out the word tokens in favour of only emojis?

I am so close to getting emojis as my predicted tokens, so I just require a little help please.

Thanks.

Topic		Replies	Views
How to filter predicted tokens in masked language modelling? Beginners	0	261	July 23, 2021
Is it possible to see all the token rankings for masked language modelling? Beginners	1	314	June 17, 2022
[HELP] How to include emojis in masked language modelling? Beginners	0	861	June 8, 2021
[HELP] Special tokens not appearing as predicted tokens! Beginners	14	909	August 4, 2021
Why does my MLM model still not output emojis after adding them as special tokens? Beginners	0	422	June 29, 2021

Is it possible to filter the predicted tokens in masked language modelling?

Related topics