(Auto) Sequence Classification model with triplets / contrastive loss

carschno · July 15, 2021, 2:36pm

Hi,
I am trying to train a cross-encoder and/or bi-encoder fine-tuned on a custom data set with about 30k entries. This takes place in a search context, and the annotations are query-document pairs each labeled as relevant (positive) or irrelevant (negative).

In order to train a text classification model using the query-document pairs, I have been following the “Sequence Classification with IMDb Reviews” guide.

This is how I encode my data for a simple text classifier (“relevant” vs “irrelevant”) like this:

def encode(examples):
  return tokenizer(
      examples['queryTerm'],
      examples['text'],
      truncation=True,
      padding='max_length',
    )

I want to proceed by training a cross-encoder using contrastive loss using -- triplets (with and annotated with different classes), as discussed in the Sentence BERT paper, among others.
I am wondering about the internals of the Auto model for sequence classification. Does it make sense to adapt my encode() function so that is calls the tokenizer roughly like this:

tokenizer(queryTerm, example1["text"], example2["text"])

Furthermore, can I use an Auto model for training a bi-encoder, again trained on triplets? What is the recommended approach for this use case?

ppaudel · September 20, 2023, 5:11pm

Hi @carschno . Did you figure out any way to proceed with this ?

Topic		Replies	Views
Two sentences classification detail questions 🤗Transformers	0	390	June 2, 2022
Sentence Pair Classification Intermediate	1	1991	May 4, 2022
Transformer vs Sentence-Transformer for text classification Intermediate	0	2166	March 12, 2024
How to use Auto Model For SequenceClassification for Multi-Class Text Classification? 🤗AutoTrain	1	3724	February 26, 2023
RoBERTa for Sentence-pair classification Models	2	1965	April 23, 2024

(Auto) Sequence Classification model with triplets / contrastive loss

Related topics