Can we train Sentence transformer model for Sequence classification

pchhapolika · June 8, 2023, 6:52am

Can we use fine tuned Sentence transformer model for fine tuning AutoModelForSequenceClassification()?

- I have a Fine tuned Sentence transforme model which has pooling layers also.
- Now, I take this model and again fine tune on AutoModelForSequenceClassification() for binary classes.

I only get this warning:

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at sent_tranf_model/ and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Code:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("sent_tranf_model/")
NUM_LABELS = len(idx_to_label)
model = AutoModelForSequenceClassification.from_pretrained("ssent_tranf_model/",num_labels = NUM_LABELS)

Is it okay to do this?

@ehalit

ehalit · June 8, 2023, 7:32am

I don’t believe you will get consistent and meaningful classification results with the randomly initialized parameters in the classifier layer as the warning message suggests. You can fine tune the model if you have labeled data but I wouldn’t expect a performance gain over BERT/RoBERTa fine-tuning. The advantage of using sentence transformers becomes apparent in the unsupervised setting.

If you want to solve a classification task with sentence transformers models, you can exploit similarity metrics between the embedding of the text to be classified and a representation embedding of each class. The question is to find those representation embeddings. Conventional unsupervised methods like clustering can be useful but you would need to map (maybe manually) the generated clusters to the classes you have. Using a more modern approach, you may describe the properties of the class with a prompt, get its embedding and treat it as the representation embedding.

TL;DR if you directly use the model for inference after loading it with this, it would not generate meaningful classifications. You can fine-tune it after initializing it like this, but I wouldn’t expect a performance gain compared to BERT/RoBERTa fine-tuning. If you don’t have labeled data, you can use unsupervised learning to match sentence embeddings with class embeddings.

pchhapolika · June 10, 2023, 11:47am

What if I fine-tune this for 10/15 epochs so the weights make substantial meaning with labelled data. Won’t it work in this case then?

ehalit · June 12, 2023, 11:10am

Sure, if you fine-tune the model with your data, it would work for your task. As I said, I don’t see a substantial advantage over fine-tuning other models such as BERT/RoBERTa but it surely can provide such an advantage depending on the specifications of the downstream task. The problem is the random initialization of the classifier layers and fine-tuning should solve that.

As always, the performance of the fine-tuned model depends on the quality of the data and hyperparameters.

pchhapolika · June 14, 2023, 8:32am

One advantage is model size reduction and fast model training. Agree?

ehalit · June 14, 2023, 9:30am

Well, I don’t know which specific checkpoint you are referring to, but these two models have exactly the same number of parameters:

So, there isn’t a model size reduction but arguably, Sentence BERT model is trained further after the standard pretraining process which might result a better harnessing of general knowledge.

Topic		Replies	Views
Sentence Embeddings From Fine-Tuned BERTForSequenceClassification 🤗Transformers	1	1677	September 29, 2021
SST2 classification with BertForSequenceClassification 🤗Transformers	0	604	August 1, 2022
Fine tuning an unsupervised model - BERT Beginners	0	858	April 7, 2022
Transformer vs Sentence-Transformer for text classification Intermediate	0	2202	March 12, 2024
Fine Tune BERT Models Beginners	5	16574	June 25, 2021

Can we train Sentence transformer model for Sequence classification

Related topics