Hi all!
Cheers to this big community (and my first post here )
I am trying to fine tune a sentence transformer model. The data I have contains below columns:
- raw_text - the raw chunks of text
- label - corresponding label for the text - True or False.
I wanted to fine tune a sentence transformer model such that the embeddings are optimized in a way that all the True sentences are closer in the vector space than all the False sentence.
I have been reading about the losses from Loss Overview — Sentence-Transformers documentation
I am really confused which loss to use for my type of data and use-case. I am leaned towards below:
since it matches my data format. As I read more about these losses and the way they are being computed using anchor, positive and negative samples I feel less confident in using them since my data does not have these kind of pair.
Can someone here help me understand if what I am trying to do is plausible with existing losses in sentence transformer library?