Info regarding sentence-transformers

prajjwal1 · August 21, 2020, 6:35am

sentence-transformers were recently added in the model hub. I wanted to know the following if :

all these models were trained on NLI (which dataset ? MNLI)
for how many epochs ?
If I want to use it for inference on NLI task, do I need to train an additional linear layer ? So freeze the AutoModel, train a nn.Linear for 1 epoch on the mean output and then perform inference. Can you please confirm this usage ?

tfmr = AutoModel.from_pretrained("sentence-transformers/bert-base-nli-mean-tokens")

class Model(nn.Module):
    def __init__(self, model):
        super(model, self).__init__()
        self.encoder = model
        for param in self.encoder.parameters():
            param.requires_grad = False
        self.classifier = nn.Linear(768, 3)
        self.criterion = nn.CrossEntropy()

    def forward(self, inputs): # x will come from dataloader using default_data_collator

        labels = inputs.pop('labels')
        model_output = self.encoder(**inputs)
        sentence_embeddings = mean_pooling(model_output,  inputs['attention_mask'])
        logits = self.classifier(sentence_embeddings)
        loss = self.criterion(logits, labels)
        return loss, logits

model = Model(tfmr)

prajjwal1 · August 21, 2020, 1:20pm

I trained it on MNLI with the model frozen, it gets around 46% accuracy after epoch 1. Seems like it has not been trained on MNLI. Can anyone confirm ?

valhalla · August 21, 2020, 3:57pm

from the paper, section 3.1

We train SBERT on the combination of the SNLI
(Bowman et al., 2015) and the Multi-Genre NLI dataset.

So yes, it’s trained on MNLI .

My guess is as it’s trained using Triplet loss to produce embeddings that are semantically meaningful
and can be compared with cosine-similarity, it might nor perform well on just classification .

prajjwal1 · August 22, 2020, 2:43am

Thanks for the information. When I read the paper, I initially had the impression that it was trained on STS task. Given that it has been trained on both SNLI and MNLI (not sure about the percentage), I’m getting 46% accuracy when I train a MLP for one epoch. I know the objective is a bit different (not classification) but I’d expect it to do well (in frozen condition) given that it has seen both NLI datasets. Things don’t improve much even after 3 epochs.

prajjwal1 · August 23, 2020, 11:18am

On what objective is this model trained on ? Section 3 of the paper mentions three objective functions classification, regression and triplet. As per Fig 2. they seem to be using regression for computing similarity scores at inference. As per Fig 1. it says that they used classification objective for fine-tuning. Based on the performance I’m getting, I doubt that these weights are coming from model trained on classification. Can someone please clarify ?

valhalla · August 23, 2020, 4:57pm

I have the same confusion, it’s not clear what was the final objective from those 3

prajjwal1 · August 25, 2020, 6:16am

@joeddav If you’ve some idea, can you please confirm this ?

joeddav · August 25, 2020, 2:28pm

I’m afraid I don’t have any extra insight here. I might head over to the sentence-transformers repo and see if you can find an answer or open an issue there (and then loop us back in over here once you have an answer )

prajjwal1 · August 26, 2020, 4:21am

I’ve opened an issue. You can follow it if interested.

Topic		Replies	Views
Can we train Sentence transformer model for Sequence classification 🤗Transformers	5	6557	June 14, 2023
Transformer vs Sentence-Transformer for text classification Intermediate	0	2205	March 12, 2024
What are some recommended pretrained models for extracting semantic feature on single sentence? Research	4	1503	December 14, 2020
Sentence transformers - SoftmaxLoss Models	1	968	June 20, 2024
Distilbert-base-nli-stsb-mean-tokens OOM encoding sentences of 100K docs Beginners	4	685	February 9, 2021

Info regarding sentence-transformers

Related topics