Dropout as the final layer in the pretrained model (DistilBERT)

Please help understand the purpose of the Dropout layer as the last layer of the TFDistilBertForSequenceClassification model.

Model training on [toxic]
--------------------------------------------------------------------------------
Model: "tf_distil_bert_for_sequence_classification_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
distilbert (TFDistilBertMain multiple                  66362880  
_________________________________________________________________
pre_classifier (Dense)       multiple                  590592    
_________________________________________________________________
classifier (Dense)           multiple                  1538      
_________________________________________________________________
dropout_59 (Dropout)         multiple                  0         <----
=================================================================
Total params: 66,955,010
Trainable params: 592,130
Non-trainable params: 66,362,880

I expected the last layer is classifier(Dense) but it is Dropout. The output is logits with of shape (batch_size, num_labels) but not sure why Dropout layer is there.

Appreciate for help.

Regards,
mon

1 Like

I have the same question now! Hopefully, someone will reply.

I am very sure I am missing something and cannot understand the purpose of this fully.

For me the model was

model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=4,
    output_attentions=False,
    output_hidden_states=False
)

Thanks in advance! :slight_smile: