[HELP] How to include emojis in masked language modelling?

anon58275033 · June 8, 2021, 4:20pm

Hello,

I am new to Hugging Face and masked language modelling (MLM), and I was wondering how to include emojis when doing such a task.

I have followed this tutorial: notebooks/language_modeling.ipynb at master · huggingface/notebooks · GitHub

I have a dataset with tweets, with each tweet containing an emoji at the end - here is a sample of my data:

ID	Tweet
1	Looking good today
2	Weather is so hot, lol
3	I hate you!!!

At the moment, I have fully trained my masked language model using my dataset, but when I predict something, it does NOT output or predict the emojis. It just predicts words.

This is my desired input from using my dataset for MLM:

"You look great [MASK]"

This is my desired output from using my dataset for MLM:

[{'score': 0.26041436195373535,
  'sequence': 'You look great 😎"',
  'token': 72,
  'token_str': '."'},
 {'score': 0.1813151091337204,
  'sequence': 'you look great 💯"',
  'token': 2901,
  'token_str': '!"'},
 {'score': 0.14516998827457428,
  'sequence': 'you look great 👌',
  'token': 328,
  'token_str': '!'},]

However, this is what I am actually getting from my output:

[{'score': 0.26041436195373535,
  'sequence': 'You look great?"',
  'token': 72,
  'token_str': '."'},
 {'score': 0.1813151091337204,
  'sequence': 'You look great."',
  'token': 2901,
  'token_str': '!"'},
 {'score': 0.14516998827457428,
  'sequence': 'You look great!',
  'token': 328,
  'token_str': '!'},]

I know it is possible to do this, but how do I do it? I am close, but not very.

Likewise, I have my model fully trained on my dataset, but it just does not seem to output emojis, even though I have included them in the training.

Does something need to be included to accept emoji? If so, what?

Thanks - I would really appreciate the help!

Topic		Replies	Views
Why does my MLM model still not output emojis after adding them as special tokens? Beginners	0	423	June 29, 2021
Is it possible to filter the predicted tokens in masked language modelling? Beginners	0	240	July 26, 2021
How to filter predicted tokens in masked language modelling? Beginners	0	261	July 23, 2021
[HELP] Special tokens not appearing as predicted tokens! Beginners	14	911	August 4, 2021
Why are my special tokens not appearing as predictions? 🤗Transformers	0	405	July 29, 2021

[HELP] How to include emojis in masked language modelling?

Related topics