Improving performance results for BERT

Stimmot · November 19, 2020, 10:53am

I’m using the bert-base-german-cased model to perform token classification with custom NER labels on a dataset of German court documents. I have 11 labels in total (including the O label), which are however not tagged in BIO form. I’m letting the model train and evaluate on an NVidia GeForce GTX Titan X.

But despite the good ressources and the model, which was actually pretrained on German judicial documents, the results are rather lacking.

precision    recall  f1-score   support

                              Date       0.87      0.99      0.93       407
                   Schadensbetrag       0.77      0.58      0.66       112
                            Delikt       0.59      0.50      0.54        44
                    Gestaendnis_ja       0.60      0.71      0.65        21
                    Vorstrafe_nein       0.00      0.00      0.00         6
Strafe_Gesamtfreiheitsstrafe_Dauer       0.76      0.91      0.83        35
          Strafe_Gesamtsatz_Betrag       0.42      0.52      0.46        25
           Strafe_Gesamtsatz_Dauer       0.52      0.82      0.64        28
                 Strafe_Tatbestand       0.30      0.29      0.30       283

                        micro avg       0.65      0.68      0.66       961
                        macro avg       0.54      0.59      0.56       961
                     weighted avg       0.64      0.68      0.66       961

What could be some steps to improve these results?
Perhaps it’s the low data count for some of the labels, or that the labels often are not single tokens but text spans of multiple tokens?

I would be glad for every hint of some more experienced users. I can also share data or other files, if they are relevant.

This is my config file:

{
    "data_dir": "./Data",
    "labels": "./Data/labels.txt",
    "model_name_or_path": "bert-base-german-cased",
    "output_dir": "./Data/Models",
    "task_type": "NER",
    "max_seq_length": 180,
    "num_train_epochs": 6,
    "per_device_train_batch_size": 48,
    "seed": 7,
    "fp16": true,
    "do_train": true,
    "do_predict": true,
    "do_eval": true
}

Stimmot · November 25, 2020, 1:52pm

Anyone who could help on this topic?

vblagoje · November 25, 2020, 2:27pm

As you suggest I’d start with exploration of your dataset. See how many examples of each tag/token you have and if rebalancing improves your scores.

Topic		Replies	Views
How to deal with differences between CoNLL 2003 dataset tokenisation and BER tokeniser when fine tuning NER model? Intermediate	6	2731	November 23, 2021
Text classifier is trained incorrectly using BERT transformers (f1 = 0) for a certain amount of dataset 🤗Transformers	2	828	August 31, 2023
Doccano dataset for named entity recognition task using BERT Beginners	3	480	May 14, 2024
Token classification example script metrics improve despite overfit 🤗Transformers	0	943	October 28, 2021
Strange shap analysis for text classification with BERT Beginners	10	903	September 17, 2024

Improving performance results for BERT

Related topics