Classifier Dropout for DecoderModelForSequenceClassification Classes

alphacentauri22 · October 25, 2024, 5:56pm

A common practice for doing classification on top of encoder only model is to add a classification head on top of pooled outputs from the encoder. The pooled outputs usually go through a dropout layer before the linear classification layer and the dropout probability is specified by classifier_dropout.

However, I see that across a few common decoder models (Mistral, Llama, Phi3), the sequence classification head (in the *Model*ForSequenceClassification class) does not have a dropout layer. Moreover, dropout IS available for the token_classification heads though.

Is there any underlying reason why dropout is not used for sequence classification but is available for token classification?

Topic		Replies	Views
Using T5 encoder with classification head Models	1	1872	July 17, 2022
Classification Heads in BERT and DistilBERT for Sequence Classification Research	2	1185	May 13, 2021
What is the classification head doing exactly? 🤗Transformers	16	24447	November 4, 2024
`seq_classif_dropout = 0.2` what is the use of adding dropout after the classification network 🤗Transformers	0	103	March 14, 2024
Dropout as the final layer in the pretrained model (DistilBERT) Models	1	1205	May 22, 2022

Classifier Dropout for *DecoderModel*ForSequenceClassification Classes

Related topics

Classifier Dropout for DecoderModelForSequenceClassification Classes