Fine-Tuning Strategies: Choosing Between microsoft/mpnet-base and sentence-transformers/all-MiniLM-L6-v2

Hi everyone,

I’m looking for recommendations on which model to fine-tune for a similarity task. I have a dataset of around 5,000 samples. According to the Sentence Transformers documentation, there are various base models to choose from, including microsoft/mpnet-base (Training Overview — Sentence Transformers documentation).

I am planning to use all-mpnet-base-v2 for fine-tuning, which is not mentioned in the documentation. I believe that using all-mpnet-base-v2 may be better due to its prior fine-tuning compared to microsoft/mpnet-base. Is this approach correct? Additionally, any insights on pooling strategies or general tips for fine-tuning would be greatly appreciated!

1 Like