Should I normalize SentenceTransformers embeddings?

aepenaflor · March 11, 2024, 10:58pm

I’m trying to group text in Spanish based on their semantic similarity, I’m using SentenceTransformer to obtain the embeddings for my texts but I have a couple of questions:

Neither of the models I’m currently using (hiiamsid/sentence_similarity_spanish_es and sentence-transformers/distiluse-base-multilingual-cased-v1) are normalized, SentenceTransformer gives me the option to normalize the embeddings, is there any reason to this?
Is there any difference in terms of performance between the cosine similarity and the dot score?
Why are some models normalized and others are not?

Topic		Replies	Views
Choosing between monolingual and multilingual models Models	0	226	May 23, 2024
Knowledge Distillation of SentenceTransformer - problems making it work Beginners	0	1061	April 9, 2022
Use sentence transformers with different embeddings size 🤗Transformers	0	293	June 6, 2023
Training a SentenceTransformers for address simliarity Beginners	3	743	March 6, 2024
Fine tuning a sentence-transformer for cosine sim on 500k sentence pairs without labels-- advice 🤗Transformers	2	1201	April 20, 2024

Should I normalize SentenceTransformers embeddings?

Related topics