I implemented the W-NUT Emerging Entities Example and used the model/tokenizer for new sentences. The example has less tokens (31) than the maximum used to train (86). My input id’s was correct (pad tokens with 0) and attention mask has only 1 for the 31 first tokens. When I investigate the results I noticed that pad tokens was classified too, not onlty with O but with another type of classifications.
This behavior is correct:? We can avoid this situation? Or we have to truncate the results from the label or sentence original size?