How to use additional input features for NER?

Actually, that’s a design choice, you can label all subtokens of a word with the same label, or (and this is more commonly done), only label the first subtoken of a word and label the rest with -100, such that they will not be taken into account by the loss function.

1 Like