Implementation difference between Bert and Roberta ForSequenceClassification?

Vijayabhaskar · June 24, 2021, 6:52am

Hi everyone, I was looking into the source code for BertForSequenceClassification and RobertaForSequenceClassification and found that the BertForSequenceClassification uses the pooled_output produced by the BertPooler to generate the head, on the other hand RobertaForSequenceClassification uses the sequence_output instead of using the pooled_output but reimplements the same with an additional dropout layer to generate the head.
If I unravel and compare the code, you can see an extra dropout layer.
bertcompare

Why not simply use the RobertaPooler/pooled_output like Bert, What is the significance of this extra dropout layer?

Topic		Replies	Views
BertForSequenceClassification classification head question 🤗Transformers	0	297	July 7, 2022
Classification Heads in BERT and DistilBERT for Sequence Classification Research	2	1185	May 13, 2021
Trying to understand XForSequenceClassification heads Intermediate	8	1322	September 24, 2020
BERT and RoBERTA giving same outputs Beginners	6	1847	April 9, 2024
Is there a way to use mean_pooling with Roberta? Intermediate	0	469	April 6, 2022

Implementation difference between Bert and Roberta ForSequenceClassification?

Related topics