Bert Multi-lingual fine-tuning for multilabel classification

MajTad · January 25, 2022, 7:31pm

Hi,
I’m trying to make French email sentences multilabel classification with certain categories such as commitmemt, proposition, meeting, request, subjectif, etc. .

The first problem I faced is that I don’t have labeled sentences rather I have french emails as dataset). Based on this I found the BC3 dataset (English emails) which has sentences annotated with some of labels listed above. So I came up with this approah; First fine tune a bert multilingual on this BC3 dataset on multilabel classification task and then make a zero-shot transfert learning with the finetuned model (or simply use it in inference) on sentences of my French emails. What do you think about this approach?

So I started by prepropcessing the BC3 dataset and obtain 848 sentences, each of them with their occurences annotations according to each categorty. On the image below, the last 5 columns represent the number of time each annotator labeled a sentence for a specific label.

Are those 848 samples enough to fine tune a Bert multilingual model?

I try to fine tune by representing category as on the image below .
46a0ee9c7436a29ba3629fc31009a497fd89aee3

With one epoch, BATCH_SIZE = 4, the loss function did’t converge, rather it oscillates between 0.79 and 0.34.

What kind of advices would you give the solve this kind of problem?

Thanks.

Topic		Replies	Views
Multi-class Classification Basics Beginners	4	4551	August 24, 2021
Fine-Tune for MultiClass or MultiLabel-MultiClass Models	52	69392	May 22, 2023
Fine tune Zero-shot classification on multi-label dataset Models	4	3548	November 30, 2023
Despite Low Training Loss, Model Can't Predict Training Set Correctly Beginners	0	847	October 26, 2022
BERT Multilabel - Different Training Dataset For Each Label? Intermediate	3	1305	December 27, 2021

Bert Multi-lingual fine-tuning for multilabel classification

Related topics