RobertaClassificationHead - reduce dense layer dimension?

I have a quick question regarding SequenceClassification with the RobertaClassificationHead: The implementation of the dense layer on-top of the transformer has config.hidden_size x config.hidden_size connections. From a theoretical point of view, would it make sense to let the user choose the number of the dimension in the dense/projection layers? If it makes sense, what would be the best way to do this right now?

This is what would probably have done what I expected:

self.dense = nn.Linear(config.hidden_size, config.proj_dim)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.out_proj = nn.Linear(config.proj_dim, config.num_labels)

I arrived at this question while experimenting using a roberta model with frozen parameters, training only the classification layer. In my case, training more than half a million parameters in the classification layer seems a bit of an overkill for my small data set.

Original RobertaClassificationHead code for reference:

class RobertaClassificationHead(nn.Module):
    """Head for sentence-level classification tasks."""

    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.out_proj = nn.Linear(config.hidden_size, config.num_labels)

    def forward(self, features, **kwargs):
        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
        x = self.dropout(x)
        x = self.dense(x)
        x = torch.tanh(x)
        x = self.dropout(x)
        x = self.out_proj(x)
        return x