Question and advice on how to fine tune distilbert for multilabel classification

I am currently tuning distilbert for sequence classification on a multi label specifically 3 labels for sentiment classification on my own custom dataset and I am getting quite high loss values of between 0.5 to 0.4 and I have tried various methods like trying learning rates like 3e-05 to 1e-05 and adding dropout rate of 0.4 to 0.3 for the embedding and 0.4 to 0.2 to sequence classification layer is there other ways of reducing the loss

  • Maybe you are not training long enough. Is your validation loss much higher than your training loss?
  • Maybe you do not have enough data
  • Maybe your dataset is very imbalanced
  • Maybe the problem is simply too hard and your labels are too similar
  • That is just a small range of learning rate. Try starting from 1e-03 and decrease until you see that your train/validation loss curve is looking promising
  • Use a lr scheduler