Weird results in the test of NER model

I am working on a NER task using the camembert-base model and Pytorch for the fine-tuning, the obtained model is not giving good results for now but at least the labels of some sentences from the training corpus are predicted correctly.

The training is done using this dataset, more details can be found in this notebook.

The weird thing about this model is that in the test phase, some labels got a null precision, recall, and f1 score! This is pretty weird as it I think it means that the model didn’t manage to predict any entities for that label, a thing that I think is not possible as a model doing random shots can predict at least one label correctly.
I tried to test the model on the training data, and got the following results, I don’t understand why this is happening with data that the model has already seen.

{'age': {'f1': 0.9339622641509434,
         'number': 420,
         'precision': 0.9252336448598131,
         'recall': 0.9428571428571428},
 'anatomie': {'f1': 0.0, 'number': 1070, 'precision': 0.0, 'recall': 0.0},
 'date': {'f1': 0.0, 'number': 38, 'precision': 0.0, 'recall': 0.0},
 'dose': {'f1': 0.0, 'number': 102, 'precision': 0.0, 'recall': 0.0},
 'duree': {'f1': 0.0, 'number': 105, 'precision': 0.0, 'recall': 0.0},
 'examen': {'f1': 0.0, 'number': 721, 'precision': 0.0, 'recall': 0.0},
 'frequence': {'f1': 0.0, 'number': 77, 'precision': 0.0, 'recall': 0.0},
 'genre': {'f1': 0.5926748057713652,
           'number': 426,
           'precision': 0.5621052631578948,
           'recall': 0.6267605633802817},
 'issue': {'f1': 0.18621973929236502,
           'number': 285,
           'precision': 0.1984126984126984,
           'recall': 0.17543859649122806},
 'mode': {'f1': 0.0, 'number': 77, 'precision': 0.0, 'recall': 0.0},
 'moment': {'f1': 0.0, 'number': 174, 'precision': 0.0, 'recall': 0.0},
 'origine': {'f1': 0.49336283185840707,
             'number': 426,
             'precision': 0.4665271966527197,
             'recall': 0.5234741784037559},
 'overall_accuracy': 0.8862863686790967,
 'overall_f1': 0.21902729417050434,
 'overall_precision': 0.37072243346007605,
 'overall_recall': 0.15542802486848398,
 'pathologie': {'f1': 0.0, 'number': 211, 'precision': 0.0, 'recall': 0.0},
 'sosy': {'f1': 0.03614457831325302,
          'number': 1161,
          'precision': 0.03911735205616851,
          'recall': 0.03359173126614987},
 'substance': {'f1': 0.0, 'number': 371, 'precision': 0.0, 'recall': 0.0},
 'traitement': {'f1': 0.0, 'number': 254, 'precision': 0.0, 'recall': 0.0},
 'valeur': {'f1': 0.0, 'number': 355, 'precision': 0.0, 'recall': 0.0}}

Any clues to fix this issue, is it possible that this problem comes from the frequency of the labels used in the dataset? or could it be something else? And what can I do to fix?