Sentence Transformers paraphrase-MiniLM fine-tuning error

Hi @nreimers,

Really love your sentence transformers. I’m currently using them as base models to fine-tune them on a 3-class classification task using the standard hf trainer.

This works very well with paraphrase-distilroberta-base-v2, but when I use variants of MiniLM-L6-v2 (I tried paraphrase-MiniLM-L6-v2 and flax-sentence-embeddings/all_datasets_v4_MiniLM-L6) I get the following error. The error occurs during training (with the hf trainer.train()) after around 100 steps. The exact same code works well with any other transformer, including other sentence transformers like paraphrase-distilroberta-base-v2, but for some reason it occurs for variants of your paraphrase-MiniLM.

/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
    219         if self.position_embedding_type == "absolute":
    220             position_embeddings = self.position_embeddings(position_ids)
--> 221             embeddings += position_embeddings
    222         embeddings = self.LayerNorm(embeddings)
    223         embeddings = self.dropout(embeddings)

RuntimeError: The size of tensor a (1088) must match the size of tensor b (512) at non-singleton dimension 1

with flax-sentence-embeddings/all_datasets_v4_MiniLM-L12" I get the error:

RuntimeError: The size of tensor a (528) must match the size of tensor b (512) at non-singleton dimension 1

Note that I also get the error when fine-tuning nreimers/MiniLM-L6-H384-uncased.

Do you know where this could come from?

(Side-note: do you recommend using the flax-sentence-embeddings/all_datasets… models? Didn’t find performance metrics on sbert.net. Are they better than the models in the ranking here? Pretrained Models — Sentence-Transformers documentation )

Hi @MoritzLaurer
Happy to hear that.

I think the issue can be that the max length is not defined for these models. Then, the text is not truncated to 512 word pieces.
Is it possible to set in the trainer the max_length for the input text?

Currently adding these models to the performance metrics. Yes, the flax-sentence-embeddings/all_datasets_v3 (or_v4) work the best.
Also added them here:

1 Like

I updated the tokenizer file. It should work now

1 Like

great, thank you very much!
Also just got it to work like this, when instantiating the tokenizer:

model_name = "flax-sentence-embeddings/all_datasets_v4_MiniLM-L6" 
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, max_length=512, model_max_length=512)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3, max_length=512,
                                                           label2id=label2id, id2label=id2label).to(device) 

Very good to know for future users of MiniLM models. I had the same error when using Microsoft’s minilm base models and I suppose they wont update their tokenizer config quickly.