Sentence transformer poor performance after fine tuning

Hi All,

I’m trying to fine-tune a MiniLM sentence transformer model with two larger (400k, 1.5M) training datasets (anchor, positive, negative triplets) to measure the impact of the amount of domain knowledge fed into the model, eg. how much domain knowledge is needed to gain X amount of score increase in model accuracy.
The base model performs pretty well on downstream tasks (~70%), but I’d like to fine-tune the scores further to increase them to 80-90%.
Unfortunately after numerous tries, the resulting model’s performance still drops to 50-60%. I’ve tried increasing/decreasing epochs, batchSizes, learning rates, changing the loss function from Triplet to MNR Loss, etc, but nothing really helps at least keep the original scores.
Is this an overfitting issue? Or something else causes this? Would it be easier to fine-tune a pretrained model like nreimers/MiniLM-L6-H384-uncased · Hugging Face?

Thank you in advance,
Mark

I’ve added GLUE and STSB score evaluation at the end of the training script to get an idea of how well the model performs on benchmark data. It looks like until 10-20k training data the model is able to maintain the original ~0.86 Pearson/Spearman scores, but these scores slowly drop when I increase the amount of training data. At 50k, the score drops to ~0.83. At 100k it drops to 0.80. When I apply all 400k training data in one epoch the scores drop below ~0.40.
My guess is that this is definitely an overfitting issue.