Transformer vs Sentence-Transformer for text classification

MattBlue92 · March 12, 2024, 8:58am

Hi to everyone!

I was wondering if I can training a sentence transformer with a triplet-loss (with and without labels data) and then use this model (or its embedding) freezeing all its layers for fine-tuning of a classification head (such as a classic fully connected network) with the same data or an hidden portion data.

In alternative, can I training a classic transformer (BERT cross encoder) with a classification head using a classic classification loss such as cross entropy, but insted of using the embedding of CLS token to feed into classification head I feed into the head an embedding vector created by or max pooling o average pooling from all tokens embedding from the last layer of the transformer?

I ask because I am curious to find out whether the feature space of sentence transformer and a classic transformer is different but still useful for the classification task.

Thank you!

Topic		Replies	Views
Can we train Sentence transformer model for Sequence classification 🤗Transformers	5	6506	June 14, 2023
Fine Tuning A sentence transformer model with my own data Intermediate	2	3053	April 17, 2024
Sentence Embeddings From Fine-Tuned BERTForSequenceClassification 🤗Transformers	1	1676	September 29, 2021
Info regarding sentence-transformers Models	8	1674	August 26, 2020
(Auto) Sequence Classification model with triplets / contrastive loss Models	1	722	September 20, 2023

Transformer vs Sentence-Transformer for text classification

Related topics