I am currently tuning distilbert for sequence classification on a multi label specifically 3 labels for sentiment classification on my own custom dataset and I am getting quite high loss values of between 0.5 to 0.4 and I have tried various methods like trying learning rates like 3e-05 to 1e-05 and adding dropout rate of 0.4 to 0.3 for the embedding and 0.4 to 0.2 to sequence classification layer is there other ways of reducing the loss
- Maybe you are not training long enough. Is your validation loss much higher than your training loss?
- Maybe you do not have enough data
- Maybe your dataset is very imbalanced
- Maybe the problem is simply too hard and your labels are too similar
- That is just a small range of learning rate. Try starting from 1e-03 and decrease until you see that your train/validation loss curve is looking promising
- Use a lr scheduler