Please help understand the purpose of the Dropout layer as the last layer of the TFDistilBertForSequenceClassification model.
Model training on [toxic]
--------------------------------------------------------------------------------
Model: "tf_distil_bert_for_sequence_classification_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
distilbert (TFDistilBertMain multiple 66362880
_________________________________________________________________
pre_classifier (Dense) multiple 590592
_________________________________________________________________
classifier (Dense) multiple 1538
_________________________________________________________________
dropout_59 (Dropout) multiple 0 <----
=================================================================
Total params: 66,955,010
Trainable params: 592,130
Non-trainable params: 66,362,880
I expected the last layer is classifier(Dense) but it is Dropout. The output is logits with of shape (batch_size, num_labels) but not sure why Dropout layer is there.
Appreciate for help.
Regards,
mon