Naming inconsistency in Distilbert config

dropout05 · November 25, 2020, 11:08pm

Hi!
I noticed one inconsistency between Distilbert and bert configs. Distilbert config stores output hidden size as hidden_size and ffn dim as hidden_dim while BERT and RoBERTa use hidden_dim for output and intermediate_size for ffn dim.

I know such thing can be hard to fix without breaking backcompatibiliby, but such behavior makes is a bit harder to get your model output size upfront.
E.g. if I want to be able to use both DistilBERT and BERT as an encoder in my model like this

class MySuperCustomModel(nn.Module):
    def __init__(self, encoder, n_classes):
        super().__init__()
        self.encoder = encoder
        hidden_size = ...  # I wish it would be as simple as encoder.config.hidden_dim
        self.logit_network = nn.Linear(hidden_size, n_classes)

the code to get the encoder output size is kind of ugly, because you need to use isinstance or something like it.
Or course, for classification you can use *ModelForClassification, but what if you want to use a pre-trained model as a seq2seq encoder or to write some other custom model.

I feel like solving this issue can make quite some people a bit happier, as they would be able to have to experiment with using different pre-trained models without code modifications and without thinking about the differences between transformer configs.

Is there a better way to get the output dimension of the model or any fixes planned? I can help with a PR too.

sgugger · November 30, 2020, 2:00pm

You can make a PR with new properties for those configs (like hidden_size for DistilBert) but we can’t change the name of the arguments of the configs as it would be a severe breaking change.
I agree that consistent named properties would be useful!

Topic		Replies	Views
Changing of value in Config file Beginners	0	300	August 29, 2022
Config parameters for custom models 🤗Transformers	0	106	April 21, 2024
DistilBERT and CLS token Beginners	2	2449	February 21, 2021
How to use encoded hidden_states as input to a Bert/DistilBert Model Beginners	0	334	June 19, 2023
Should I use BertConfig? Why these output are different? Beginners	1	520	February 11, 2022

Naming inconsistency in Distilbert config

Related topics