Restricting BERT scores; Methods to counter high confidence in classification of short non-word-like-phrases to labels

PerhapsCow · May 27, 2021, 1:59pm

Hello.
Recently I’ve been practicing sequence classification with BERT language models via transformers library.
I’ve noticed that BERT classifies with high score phrases like “asdfgh” to labels. I’ve been able to verify that unwanted(in just that case) feature on many different models trained with different datasets(plenty with examples and trained on many epochs).
I am pretty sure there are people just like me who might have wanted to kind of curb classification scores when it comes to non-word-like-phrases.
Yet I haven’t come up with complete solution. I’ve tried using spellcheckers, regular expressions and utilising BERT subword splitting, yet these help in that matter only partially.
Does anyone have had interest in simmilar case?
Were you able to deal it with somehow? If, yes could you tell me what approach have you undertook?
Thank you.

Topic		Replies	Views
Bert Text classification Intermediate	7	560	November 24, 2023
Bert for Text classification evaluation - help needed Beginners	0	198	September 7, 2023
Any BERT model recommendation needed for getting feature of structured sentences Beginners	0	401	June 8, 2022
Sequence Classification -- Fine Tune? Beginners	3	3138	January 31, 2021
Fine-Tune for MultiClass or MultiLabel-MultiClass Models	52	69450	May 22, 2023

Restricting BERT scores; Methods to counter high confidence in classification of short non-word-like-phrases to labels

Related topics