Hi,
I was just wondering whether it would be possible to see all the predicted tokens for masked language modelling? Specifically, all the tokens with a low probability.
For example, consider this masked language model:
unmasker("I am feeling <mask> today")
[{'score': 0.5322356820106506,
'sequence': 'I am feeling good today',
'token': 4,
'token_str': good'},
{'score': 0.1725485771894455,
'sequence': 'I am feeling happy today!',
'token': 328,
'token_str': 'happy'},
{'score': 0.1252109706401825,
'sequence': 'I am feeling sad today."',
'token': 72,
'token_str': 'sad"'},
{'score': 0.01904081553220749,
'sequence': 'I am feeling angry today!"',
'token': 2901,
'token_str': 'angry'},
{'score': 0.012199202552437782,
'sequence': 'I am feeling fun today…',
'token': 1174,
'token_str': 'fun'}]
As you can see from my output, the top tokens are “good”, “happy”, “sad”, “angry” and “fun”. However, would it be possible to see all the predicted tokens beyond the top 5?
I just want to see all a list of all the predicted tokens: the ones which have the lowest probability - if this is possible.
I don’t want to see the top 5 predicted; I want to see all of them.
Thanks.