Dropout as the final layer in the pretrained model (DistilBERT)

monta · July 7, 2021, 2:39am

Please help understand the purpose of the Dropout layer as the last layer of the TFDistilBertForSequenceClassification model.

Model training on [toxic]
--------------------------------------------------------------------------------
Model: "tf_distil_bert_for_sequence_classification_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
distilbert (TFDistilBertMain multiple                  66362880  
_________________________________________________________________
pre_classifier (Dense)       multiple                  590592    
_________________________________________________________________
classifier (Dense)           multiple                  1538      
_________________________________________________________________
dropout_59 (Dropout)         multiple                  0         <----
=================================================================
Total params: 66,955,010
Trainable params: 592,130
Non-trainable params: 66,362,880

I expected the last layer is classifier(Dense) but it is Dropout. The output is logits with of shape (batch_size, num_labels) but not sure why Dropout layer is there.

Appreciate for help.

Regards,
mon

abdullahalzubaer · May 22, 2022, 8:58pm

I have the same question now! Hopefully, someone will reply.

I am very sure I am missing something and cannot understand the purpose of this fully.

For me the model was

model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=4,
    output_attentions=False,
    output_hidden_states=False
)

Thanks in advance!

Topic		Replies	Views
`seq_classif_dropout = 0.2` what is the use of adding dropout after the classification network 🤗Transformers	0	103	March 14, 2024
Having issues finetuning a Bert model pretrained from scratch on downstream task (GLUE Dataset)! Intermediate	0	714	March 26, 2022
Classification Heads in BERT and DistilBERT for Sequence Classification Research	2	1185	May 13, 2021
Predictions for sequenceclassification task Beginners	2	1256	October 9, 2020
Error while saving and loading a Bert model 🤗Transformers	0	943	November 21, 2022

Dropout as the final layer in the pretrained model (DistilBERT)

Related topics