Sentence Transformers paraphrase-MiniLM fine-tuning error

MoritzLaurer · August 30, 2021, 7:13pm

Really love your sentence transformers. I’m currently using them as base models to fine-tune them on a 3-class classification task using the standard hf trainer.

This works very well with paraphrase-distilroberta-base-v2, but when I use variants of MiniLM-L6-v2 (I tried paraphrase-MiniLM-L6-v2 and flax-sentence-embeddings/all_datasets_v4_MiniLM-L6) I get the following error. The error occurs during training (with the hf trainer.train()) after around 100 steps. The exact same code works well with any other transformer, including other sentence transformers like paraphrase-distilroberta-base-v2, but for some reason it occurs for variants of your paraphrase-MiniLM.

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    219         if self.position_embedding_type == "absolute":
    220             position_embeddings = self.position_embeddings(position_ids)
--> 221             embeddings += position_embeddings
    222         embeddings = self.LayerNorm(embeddings)
    223         embeddings = self.dropout(embeddings)

RuntimeError: The size of tensor a (1088) must match the size of tensor b (512) at non-singleton dimension 1

with flax-sentence-embeddings/all_datasets_v4_MiniLM-L12" I get the error:

RuntimeError: The size of tensor a (528) must match the size of tensor b (512) at non-singleton dimension 1

Note that I also get the error when fine-tuning nreimers/MiniLM-L6-H384-uncased.

Do you know where this could come from?

(Side-note: do you recommend using the flax-sentence-embeddings/all_datasets… models? Didn’t find performance metrics on sbert.net. Are they better than the models in the ranking here? Pretrained Models — Sentence-Transformers documentation )

nreimers · August 30, 2021, 7:51pm

Hi @MoritzLaurer
Happy to hear that.

I think the issue can be that the max length is not defined for these models. Then, the text is not truncated to 512 word pieces.
Is it possible to set in the trainer the max_length for the input text?

Currently adding these models to the performance metrics. Yes, the flax-sentence-embeddings/all_datasets_v3 (or_v4) work the best.
Also added them here:

nreimers · August 30, 2021, 8:17pm

I updated the tokenizer file. It should work now

MoritzLaurer · August 30, 2021, 8:22pm

great, thank you very much!
Also just got it to work like this, when instantiating the tokenizer:

model_name = "flax-sentence-embeddings/all_datasets_v4_MiniLM-L6" 
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, max_length=512, model_max_length=512)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3, max_length=512,
                                                           label2id=label2id, id2label=id2label).to(device)

Very good to know for future users of MiniLM models. I had the same error when using Microsoft’s minilm base models and I suppose they wont update their tokenizer config quickly.

Topic		Replies	Views
Sentence transformer poor performance after fine tuning 🤗Transformers	1	1590	September 11, 2022
Fine-Tuning Strategies: Choosing Between microsoft/mpnet-base and sentence-transformers/all-MiniLM-L6-v2 🤗Transformers	2	543	November 15, 2024
Fine-tuning sentence-transformer for retrieval task makes things worse Beginners	0	1722	July 25, 2023
Sentence Similarity demo not working Models	3	1088	June 8, 2023
Fine tuning a sentence-transformer for cosine sim on 500k sentence pairs without labels-- advice 🤗Transformers	2	1201	April 20, 2024

Sentence Transformers paraphrase-MiniLM fine-tuning error

Related topics