Weird results in the test of NER model

BillelBenoudjit · February 17, 2022, 11:36am

I am working on a NER task using the camembert-base model and Pytorch for the fine-tuning, the obtained model is not giving good results for now but at least the labels of some sentences from the training corpus are predicted correctly.

The training is done using this dataset, more details can be found in this notebook.

The weird thing about this model is that in the test phase, some labels got a null precision, recall, and f1 score! This is pretty weird as it I think it means that the model didn’t manage to predict any entities for that label, a thing that I think is not possible as a model doing random shots can predict at least one label correctly.
I tried to test the model on the training data, and got the following results, I don’t understand why this is happening with data that the model has already seen.

{'age': {'f1': 0.9339622641509434,
         'number': 420,
         'precision': 0.9252336448598131,
         'recall': 0.9428571428571428},
 'anatomie': {'f1': 0.0, 'number': 1070, 'precision': 0.0, 'recall': 0.0},
 'date': {'f1': 0.0, 'number': 38, 'precision': 0.0, 'recall': 0.0},
 'dose': {'f1': 0.0, 'number': 102, 'precision': 0.0, 'recall': 0.0},
 'duree': {'f1': 0.0, 'number': 105, 'precision': 0.0, 'recall': 0.0},
 'examen': {'f1': 0.0, 'number': 721, 'precision': 0.0, 'recall': 0.0},
 'frequence': {'f1': 0.0, 'number': 77, 'precision': 0.0, 'recall': 0.0},
 'genre': {'f1': 0.5926748057713652,
           'number': 426,
           'precision': 0.5621052631578948,
           'recall': 0.6267605633802817},
 'issue': {'f1': 0.18621973929236502,
           'number': 285,
           'precision': 0.1984126984126984,
           'recall': 0.17543859649122806},
 'mode': {'f1': 0.0, 'number': 77, 'precision': 0.0, 'recall': 0.0},
 'moment': {'f1': 0.0, 'number': 174, 'precision': 0.0, 'recall': 0.0},
 'origine': {'f1': 0.49336283185840707,
             'number': 426,
             'precision': 0.4665271966527197,
             'recall': 0.5234741784037559},
 'overall_accuracy': 0.8862863686790967,
 'overall_f1': 0.21902729417050434,
 'overall_precision': 0.37072243346007605,
 'overall_recall': 0.15542802486848398,
 'pathologie': {'f1': 0.0, 'number': 211, 'precision': 0.0, 'recall': 0.0},
 'sosy': {'f1': 0.03614457831325302,
          'number': 1161,
          'precision': 0.03911735205616851,
          'recall': 0.03359173126614987},
 'substance': {'f1': 0.0, 'number': 371, 'precision': 0.0, 'recall': 0.0},
 'traitement': {'f1': 0.0, 'number': 254, 'precision': 0.0, 'recall': 0.0},
 'valeur': {'f1': 0.0, 'number': 355, 'precision': 0.0, 'recall': 0.0}}

Any clues to fix this issue, is it possible that this problem comes from the frequency of the labels used in the dataset? or could it be something else? And what can I do to fix?

Topic		Replies	Views
NER model only predicts the outside 'O' tag Beginners	1	831	April 27, 2021
Ask for help with prediction results of Named Entity Recognition Task 🤗Transformers	10	3230	May 21, 2021
Fine tuned model for NER gives empty results Beginners	0	117	January 24, 2024
Improving performance results for BERT 🤗Transformers	2	967	November 25, 2020
Missing Repository Models	0	226	May 9, 2023

Weird results in the test of NER model

Related topics