Actually, that’s a design choice, you can label all subtokens of a word with the same label, or (and this is more commonly done), only label the first subtoken of a word and label the rest with -100, such that they will not be taken into account by the loss function.
1 Like