Hi
I’m using sentence transformers with a SoftmaxLoss, in order to classify sentences to positive and negative (just like the SNLI dataset). Here’s my code:
from sentence_transformers import losses
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
loss_func = losses.SoftMaxLoss(model=model, num_labels=2, sentence_embedding_dimension=model.get_sentence_embedding_dimension())
evaluator = EmbeddingSimilarityEvaluator(...)
model.fit(train_objectives=[(train_dataloader, loss_func)],
optimizer_params=optimizer_params,
evaluator=evaluator,
epochs=epochs,
warmup_steps=warmup_steps,
save_best_model=True,
output_path=save_path,)
However, I’m struggling to understand 2 points when using this loss function:
- This loss function has a classifier which transforms the embeddings into a 2d vector, but it is not trained. Its weights seem to be initialized with a random uniform distribution and at least to my understanding they are never adjusted because the classifier layer is not part of the model, it’s part of the loss class. Is this correct or am I missing something?
- Say I want to use the model for inference after fine tuning. I
encode()
2 sentences, get the embeddings, and then I will have to imitate the loss’s behavior (e.g. if it were a CosineSimilarityLoss or a ContrastiveLoss then I’d just compute the cosine similarity between the 2 embedding vectors and compare it to a threshold, and get my label). However when I’m using the SoftmaxLoss, how am I supposed to transform the embeddings into a “distance value”? Since the classifier is not part of the model, I can’t use it.
Any guidance is much appreciated