Hi,
I’ve recently discovered the power of the fill-mask pipeline from Huggingface, and while playing with it, I discovered that it has issues handling non-vocabulary words.
For example, in the sentence, “The internal analysis indicates that the company has reached a [MASK] level.”, I would like to know which one of these words [‘good’, ‘virtuous’, ‘obedient’] is the most probable according to the bert-large-cased-whole-word-masking model.
The model refuses to give a score to the words virtuous and obedient because they do not exist in the vocabulary as such, therefore the scores are given to the first tokens that are recognized: v and o; which are not useful.
So the question remains, how could I get the prediction scores for the whole word instead of scores for individual subword tokens?