Fine Tuning A sentence transformer model with my own data

Hi all!

Cheers to this big community (and my first post here :mega:)

I am trying to fine tune a sentence transformer model. The data I have contains below columns:

  1. raw_text - the raw chunks of text
  2. label - corresponding label for the text - True or False.

I wanted to fine tune a sentence transformer model such that the embeddings are optimized in a way that all the True sentences are closer in the vector space than all the False sentence.

I have been reading about the losses from Loss Overview — Sentence-Transformers documentation

I am really confused which loss to use for my type of data and use-case. I am leaned towards below:

since it matches my data format. As I read more about these losses and the way they are being computed using anchor, positive and negative samples I feel less confident in using them since my data does not have these kind of pair.

Can someone here help me understand if what I am trying to do is plausible with existing losses in sentence transformer library?


Refer to this blog post: Train and Fine-Tune Sentence Transformers Models. The blog post has a section on various cases that your dataset may be in, and which loss functions one can use correspondingly.

I am trying to train the Sentence Transformer model also checked the link but always I get the accuracy of only 20 to 30%.

  1. Is it maybe base model ? what base model to train
  2. How to evaluate these models?