Quora Duplicate Questions Multi-Task Learning

MichaelKarpe · May 6, 2024, 6:36pm

Hey all,

Completely new to transformers and interested in sentence transformers, I experimented the training multi task learning tutorial in a Kaggle notebook. After a few epochs of finetuning of stsb-distilbert-base, I observed an increase of the SequentialEvaluator metric from 0.937 to 0.952. Everything’s fine here.

I tried after to change with some top models of the MTEB leaderboard. I observed the following performances before and after finetuning on a few epochs:

mixedbread-ai/mxbai-embed-large-v1: 0.9830 to 0.0007
avsolatorio/GIST-large-Embedding-v0: 0.9828 to 0.9665
BAAI/bge-large-en-v1.5: 0.9823 to 0.0008

While I could understand a slight decrease for the second model (would love to know your interpretation of this decrease though), my question is especially on the two models where it seems the finetuning does not work at all. How to explain or interpret this?

Also, how can I find the necessary information on HuggingFace that could guide me towards knowing whether a specific finetuning procedure would work for a given model? If a finetuning procedure works on a specific dataset, is there any way to know what are the chances that it can also work well on another dataset?

Many thanks for your help to a new joiner!

Topic		Replies	Views
Finetuning Wave2Vec vs. Finetuning Distilbert Beginners	1	379	May 31, 2023
Issues with Finetuning QuestionAnswer model Beginners	0	360	May 27, 2021
Sentence transformer poor performance after fine tuning 🤗Transformers	1	1590	September 11, 2022
Fine-Tuning Strategies: Choosing Between microsoft/mpnet-base and sentence-transformers/all-MiniLM-L6-v2 🤗Transformers	2	543	November 15, 2024
0% accuracy when finetuning from certain models. [CLS] token embeddings not learned 🤗Transformers	1	608	November 2, 2023

Quora Duplicate Questions Multi-Task Learning

Related topics