Quora Duplicate Questions Multi-Task Learning

Hey all,

Completely new to transformers and interested in sentence transformers, I experimented the training multi task learning tutorial in a Kaggle notebook. After a few epochs of finetuning of stsb-distilbert-base, I observed an increase of the SequentialEvaluator metric from 0.937 to 0.952. Everything’s fine here.

I tried after to change with some top models of the MTEB leaderboard. I observed the following performances before and after finetuning on a few epochs:

  • mixedbread-ai/mxbai-embed-large-v1: 0.9830 to 0.0007
  • avsolatorio/GIST-large-Embedding-v0: 0.9828 to 0.9665
  • BAAI/bge-large-en-v1.5: 0.9823 to 0.0008

While I could understand a slight decrease for the second model (would love to know your interpretation of this decrease though), my question is especially on the two models where it seems the finetuning does not work at all. How to explain or interpret this?

Also, how can I find the necessary information on HuggingFace that could guide me towards knowing whether a specific finetuning procedure would work for a given model? If a finetuning procedure works on a specific dataset, is there any way to know what are the chances that it can also work well on another dataset?

Many thanks for your help to a new joiner!