Sentence transformers - SoftmaxLoss

artificial-cerebrum · May 8, 2023, 7:22am

Hi
I’m using sentence transformers with a SoftmaxLoss, in order to classify sentences to positive and negative (just like the SNLI dataset). Here’s my code:

from sentence_transformers import losses
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
loss_func = losses.SoftMaxLoss(model=model, num_labels=2, sentence_embedding_dimension=model.get_sentence_embedding_dimension())
evaluator = EmbeddingSimilarityEvaluator(...)
model.fit(train_objectives=[(train_dataloader, loss_func)],
          optimizer_params=optimizer_params,
          evaluator=evaluator,
          epochs=epochs,
          warmup_steps=warmup_steps,
          save_best_model=True,
          output_path=save_path,)

However, I’m struggling to understand 2 points when using this loss function:

This loss function has a classifier which transforms the embeddings into a 2d vector, but it is not trained. Its weights seem to be initialized with a random uniform distribution and at least to my understanding they are never adjusted because the classifier layer is not part of the model, it’s part of the loss class. Is this correct or am I missing something?
Say I want to use the model for inference after fine tuning. I encode() 2 sentences, get the embeddings, and then I will have to imitate the loss’s behavior (e.g. if it were a CosineSimilarityLoss or a ContrastiveLoss then I’d just compute the cosine similarity between the 2 embedding vectors and compare it to a threshold, and get my label). However when I’m using the SoftmaxLoss, how am I supposed to transform the embeddings into a “distance value”? Since the classifier is not part of the model, I can’t use it.

Any guidance is much appreciated

kriz17 · June 20, 2024, 5:35am

The softmax head is only the part of the training and not inference. Check this link - Question about softmaxloss · Issue #17 · UKPLab/sentence-transformers · GitHub.
If you want to train a model for classification purpose, i would recommend to use cross encoder and not the classical sentence transformer architecture - Cross-Encoders — Sentence Transformers documentation.

Topic		Replies	Views
SentenceTransformer labels for SoftmaxLoss Models	0	174	January 13, 2024
Which loss function is used for paraphrase-multilingual-MiniLM-L12-v2 Models	0	531	May 31, 2022
Fine Tuning A sentence transformer model with my own data Intermediate	2	3059	April 17, 2024
Info regarding sentence-transformers Models	8	1674	August 26, 2020
Transformer vs Sentence-Transformer for text classification Intermediate	0	2171	March 12, 2024

Sentence transformers - SoftmaxLoss

Related topics