Hi,
I am working on a multi-label topic classifier that can classify webpages into some of our ~100 topics. The classifier currently uses a basic Neural Network and I wish to adapt the XLM-R model provided by Huggingface to give the classifier multi-lingual capabilities.
However, when I train a classifier using XLM-R the performance (using pr_auc) is worse than that of the classifier using the basic Neural Network.
What can I do to improve the performance of a transformer-based Neural Network model?
2 Likes
I don’t have an answer, unfortunately, but I have the same issue and can add some details. I have a classification problem with 60 classes. I have training data of ~70k documents, unfortunately, with very unbalanced class distribution. My baseline is a FastText classifier trained on the same data which achieves an accuracy of ~0.45. The majority of the documents is in English, but some are in other languages (all
I have followed the tutorial for fine-tuning pre-trained classification models using the Trainer API. I have not changed much from the example other than the number of labels, and the model.
Running on a Colab notebook with GPU, training time for a single epoch is roughly 4h. I’ve run intermediate evaluations every 500 steps, and the accuracy is around 0.04, no changes (neither positive nor negative). Same happens when using a different model (e.g. bert-base-cased
, as shown in the tutorial).
My suspicion is that the model does not learn anything at all, the accuracy is very close to random choice.
What am I missing? Is there a more suitable, up-to-date tutorial somewhere?