I have trained my pre-trained BERT model from the Hugging Face library on the Jigsaw Toxic Comment Classification dataset
to detect hateful comments. However, when I try to do infer with the positive sentences, it is giving me wrong results.
My accuracy comes between 97-98 % every time.
I have trained with
epochs: 4,5,6
LR: 2e-5, 3e-5, 5e-5
Pre-trained models used: bert_base_uncased, Roberta, XLM-R(for German and English combined dataset)
Previously my classification report was too low since the dataset was highly imbalanced. To overcome this I augmented the data by translating (ENG â DE â ENG, ENG-> FR-> ENG, ENG-> IT-> ENG). After augmenting, I am getting better Precision, recall and F1 score, however, when I try to infer a few positive comments it is giving me the wrong sentiment score.
For example: If I am giving input as: âYou are a nice personâ, the model infers it as either toxic or insult. I am not sure how to resolve this issue, any suggestion will be really helpful.
test_comment = âYou are a nice personâ
encoding = tokenizer.encode_plus(
test_comment,
add_special_tokens=True,
max_length=512,
return_token_type_ids=False,
padding=âmax_lengthâ,
return_attention_mask=True,
return_tensors=âptâ,
)
_, test_prediction = trained_model(encoding[âinput_idsâ], encoding[âattention_maskâ])
test_prediction = test_prediction.flatten().numpy()
for label, prediction in zip(LABEL_COLUMNS, test_prediction):
print(f"{label}: {prediction}")
Result:
toxic: 0.289602130651474
severe_toxic: 0.012312621809542179
obscene: 0.26335516571998596
threat: 0.0017053773626685143
insult: 0.54698246717453
identity_hate: 0.0013856851728633046