I would have liked to know which loss function is used for this model and how I could have found it without asking the question here! The hugging face page on this model is a bit succinct.
It seems to me that according to the SBERT article there are 2 possible models in training. One directly using the cosine similarity between 2 sentences, the other concatenating the vectors of the 2 sentences and assigning with a softmax classifier to classes 0,1,-1
Which one is used when training?
Thank you in advance