Hi @nreimers,
Really love your sentence transformers. I’m currently using them as base models to fine-tune them on a 3-class classification task using the standard hf trainer.
This works very well with paraphrase-distilroberta-base-v2, but when I use variants of MiniLM-L6-v2 (I tried paraphrase-MiniLM-L6-v2 and flax-sentence-embeddings/all_datasets_v4_MiniLM-L6) I get the following error. The error occurs during training (with the hf trainer.train()) after around 100 steps. The exact same code works well with any other transformer, including other sentence transformers like paraphrase-distilroberta-base-v2, but for some reason it occurs for variants of your paraphrase-MiniLM.
/usr/local/lib/python3.7/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
219 if self.position_embedding_type == "absolute":
220 position_embeddings = self.position_embeddings(position_ids)
--> 221 embeddings += position_embeddings
222 embeddings = self.LayerNorm(embeddings)
223 embeddings = self.dropout(embeddings)
RuntimeError: The size of tensor a (1088) must match the size of tensor b (512) at non-singleton dimension 1
with flax-sentence-embeddings/all_datasets_v4_MiniLM-L12" I get the error:
RuntimeError: The size of tensor a (528) must match the size of tensor b (512) at non-singleton dimension 1
Note that I also get the error when fine-tuning nreimers/MiniLM-L6-H384-uncased.
Do you know where this could come from?
(Side-note: do you recommend using the flax-sentence-embeddings/all_datasets… models? Didn’t find performance metrics on sbert.net. Are they better than the models in the ranking here? Pretrained Models — Sentence-Transformers documentation )