Token Classification Model making mistake outside of training dataset

NR1 · October 30, 2021, 8:43pm

I have finetuned both BERT and Albert on a token classification task (for a type of key-phrase extraction task where the model receives a paragraph and it has to select all the key phrases). I used the beginning-middle annotation scheme, where I label the first token of the phrase with “1” and the rest of the tokens in the key phrase “2”. However, both of my models that I have finetuned on my dataset (has more than 10,000 training samples) make mistakes where the model doesn’t place a “1” for the first token of the key phrase and only places "2"s (this error occurred roughly a 1000 times for a 2000 sample validation set throughout the model training (I tested model for this error every 500 optimization steps)). Why is this happening (my training dataset has no errors)?

Could you please take a look at this? @valhalla or @lysandre

Topic		Replies	Views
Strange shap analysis for text classification with BERT Beginners	10	880	September 17, 2024
Significance of the [CLS] token Research	16	28319	September 5, 2024
BertForTokenClassification Classifying [PAD] tokens Models	0	282	August 13, 2021
BERT finetuning "index out of range in self" Intermediate	2	4115	August 24, 2021
Word Specific Classification (custom token classification?) Beginners	0	76	May 28, 2024

Token Classification Model making mistake outside of training dataset

Related topics