For the bert-base-uncased tokenizer, it tokenizes a “rectangle” into two tokens. How does the fill-mask pipeline work when a word like rectangle in a sentence is whole-word masked?
For example, if the sentence “My car is rectangle shape” is the original sentence, and “My car is [MASK] shape” is given as an input, can the entire word rectangle be restored? If not, what is the general approach to recovering whole words?
I have to perform a task to determine which shape word has the highest score among triangle, circle, and rectangle in that [MASK].