RobertaClassificationHead - reduce dense layer dimension?

croth1 · July 23, 2021, 3:55pm

I have a quick question regarding SequenceClassification with the RobertaClassificationHead: The implementation of the dense layer on-top of the transformer has config.hidden_size x config.hidden_size connections. From a theoretical point of view, would it make sense to let the user choose the number of the dimension in the dense/projection layers? If it makes sense, what would be the best way to do this right now?

This is what would probably have done what I expected:

self.dense = nn.Linear(config.hidden_size, config.proj_dim)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.out_proj = nn.Linear(config.proj_dim, config.num_labels)

I arrived at this question while experimenting using a roberta model with frozen parameters, training only the classification layer. In my case, training more than half a million parameters in the classification layer seems a bit of an overkill for my small data set.

Original RobertaClassificationHead code for reference:

class RobertaClassificationHead(nn.Module):
    """Head for sentence-level classification tasks."""

    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.out_proj = nn.Linear(config.hidden_size, config.num_labels)

    def forward(self, features, **kwargs):
        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
        x = self.dropout(x)
        x = self.dense(x)
        x = torch.tanh(x)
        x = self.dropout(x)
        x = self.out_proj(x)
        return x

Topic		Replies	Views
What is the purpose of the additional dense layer in classification heads? 🤗Transformers	9	10116	August 3, 2020
Having inconsistent results when I import pipeline of my custom made sequence classification 🤗Transformers	0	302	July 28, 2022
How to extract encoding before classification layer? 🤗Transformers	0	573	February 21, 2023
Trying to understand XForSequenceClassification heads Intermediate	8	1322	September 24, 2020
Text classification with roberta Models	0	429	August 4, 2022

RobertaClassificationHead - reduce dense layer dimension?

Related topics