Is it possible to see all the token rankings for masked language modelling?


I was just wondering whether it would be possible to see all the predicted tokens for masked language modelling? Specifically, all the tokens with a low probability.

For example, consider this masked language model:

unmasker("I am feeling <mask> today")
[{'score': 0.5322356820106506,
  'sequence': 'I am feeling good today',
  'token': 4,
  'token_str': good'},
 {'score': 0.1725485771894455,
  'sequence': 'I am feeling happy today!',
  'token': 328,
  'token_str': 'happy'},
 {'score': 0.1252109706401825,
  'sequence': 'I am feeling sad today."',
  'token': 72,
  'token_str': 'sad"'},
 {'score': 0.01904081553220749,
  'sequence': 'I am feeling angry today!"',
  'token': 2901,
  'token_str': 'angry'},
 {'score': 0.012199202552437782,
  'sequence': 'I am feeling fun today…',
  'token': 1174,
  'token_str': 'fun'}]

As you can see from my output, the top tokens are “good”, “happy”, “sad”, “angry” and “fun”. However, would it be possible to see all the predicted tokens beyond the top 5?

I just want to see all a list of all the predicted tokens: the ones which have the lowest probability - if this is possible.

I don’t want to see the top 5 predicted; I want to see all of them.


I guess the only way to do that is to work with the model outside the pipeline method. Therefore you can use the logits to infer the probabilities of any token you want.