Hi everyone,
I’m using sentence transformers to compute similarity between text. (embedding, mean pooling, cosine similarity)
I get the impressions that beyond text meaning, text length plays a huge role in similarity.
As anyone experienced the same thing ?
sebastien
I fine tuned a model to compute similarity between names. This is a toy example:
name0 name1 label
Test Test y
Test Hi n
I fined-tuned a model using the label and feeding it with pairs of names.
I also found out that longer names tend to be predicted as y
.