Trying to understand XForSequenceClassification heads

BramVanroy · September 23, 2020, 2:22pm

When answering this question, I found that BertForSequenceClassification’s doesn’t actually use the pretrained linear weights for the classification layer. I kinda had expected that it did. It probably would not be useful for most tasks and needs finetuning anyway, but still.

Maybe the XForX models can have a line in their docs stating which of its heads are pretrained and which ones aren’t.

Topic		Replies	Views
NLI 2-sentence classification with GPT2, XLNet, etc.? 🤗Transformers	2	1944	September 9, 2020
Implementation difference between Bert and Roberta ForSequenceClassification? 🤗Transformers	0	558	June 24, 2021
What is the classification head doing exactly? 🤗Transformers	16	24416	November 4, 2024
BertForSequenceClassification only seems to have linear activation at the end - is this a bug? 🤗Transformers	1	2892	September 30, 2020
SST2 classification with BertForSequenceClassification 🤗Transformers	0	604	August 1, 2022

Trying to understand XForSequenceClassification heads

Related topics