`seq_classif_dropout = 0.2` what is the use of adding dropout after the classification network

entropy-mnk · March 14, 2024, 3:53am

While using Distillbert model from hugging face i found out we are having a dropout layer after the classification layer. Before applying softmax, why are we droping out informations ? It seems like a bad idea for me, but want to know more because hugging face had set this to 0.2 as default parameter Is there any good reason behind this

Topic		Replies	Views
Dropout as the final layer in the pretrained model (DistilBERT) Models	1	1203	May 22, 2022
Classifier Dropout for DecoderModelForSequenceClassification Classes 🤗Transformers	0	59	October 25, 2024
Adding dropout in custom model, but setting dropout through .from_pretrained() 🤗Transformers	2	59	March 21, 2025
Correct way to implement custom model on top of pretrained bert? Beginners	0	903	November 19, 2022
Regularisation using Bert Beginners	0	1637	November 16, 2021

`seq_classif_dropout = 0.2` what is the use of adding dropout after the classification network

Related topics