I’m trying to group text in Spanish based on their semantic similarity, I’m using SentenceTransformer
to obtain the embeddings for my texts but I have a couple of questions:
-
Neither of the models I’m currently using (hiiamsid/sentence_similarity_spanish_es and sentence-transformers/distiluse-base-multilingual-cased-v1) are normalized,
SentenceTransformer
gives me the option to normalize the embeddings, is there any reason to this? -
Is there any difference in terms of performance between the cosine similarity and the dot score?
-
Why are some models normalized and others are not?