Hello,
I am new to Hugging Face and masked language modelling (MLM), and I was wondering how to include emojis when doing such a task.
I have followed this tutorial: notebooks/language_modeling.ipynb at master · huggingface/notebooks · GitHub
I have a dataset with tweets, with each tweet containing an emoji at the end - here is a sample of my data:
ID | Tweet |
---|---|
1 | Looking good today |
2 | Weather is so hot, lol |
3 | I hate you!!! |
At the moment, I have fully trained my masked language model using my dataset, but when I predict something, it does NOT output or predict the emojis. It just predicts words.
This is my desired input from using my dataset for MLM:
"You look great [MASK]"
This is my desired output from using my dataset for MLM:
[{'score': 0.26041436195373535,
'sequence': 'You look great 😎"',
'token': 72,
'token_str': '."'},
{'score': 0.1813151091337204,
'sequence': 'you look great 💯"',
'token': 2901,
'token_str': '!"'},
{'score': 0.14516998827457428,
'sequence': 'you look great 👌',
'token': 328,
'token_str': '!'},]
However, this is what I am actually getting from my output:
[{'score': 0.26041436195373535,
'sequence': 'You look great?"',
'token': 72,
'token_str': '."'},
{'score': 0.1813151091337204,
'sequence': 'You look great."',
'token': 2901,
'token_str': '!"'},
{'score': 0.14516998827457428,
'sequence': 'You look great!',
'token': 328,
'token_str': '!'},]
I know it is possible to do this, but how do I do it? I am close, but not very.
Likewise, I have my model fully trained on my dataset, but it just does not seem to output emojis, even though I have included them in the training.
Does something need to be included to accept emoji? If so, what?
Thanks - I would really appreciate the help!