Forcing BERT hidden dimension size

Hi. I am doing a parameter study to investigate the effect of different hidden dimension sizes of pretrained models. I’ve successfully used the code below for roberta and mentalbert but can’t seem to get ignore_mismatched_sizes to work for bert-base-uncased. Despite having it already on the code, the RuntimeError I receive still says

You may consider adding ignore_mismatched_sizes=True in the model from_pretrained method.

I have replicated the issue using Colab with this code:

from transformers import AutoConfig, AutoModelForSequenceClassification, AutoModel

# checkpoint = "bert-base-uncased"   #Throws RuntimeError
checkpoint = "roberta-base"

num_class = 3
args = {"hidden_size": 48}

config = AutoConfig.from_pretrained(checkpoint, num_labels = num_class, **args)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, config = config, ignore_mismatched_sizes = True)

To add, using it for roberta just throws a lot of warnings but returns the model with forced hidden dimesions nonetheless. Bert on the other hand throws the error above.

I’m quite confused how it works for the other models but not for Bert. Any insight on how to force Bert’s hidden dimension size is greatly appreciated.

I have never heard of this dimension reduction at the hidden_layer
BERT outputs 768 output logits and thats all you get… no more and no less.

I know of 3 possible way around this though

  1. add an additional layer with input of 768 and output size with the size you want lets say [32,64,128,256,512] and fine-tune all versions of the models on one same dataset. Do it multiple times and get the average.
  2. Randomly pick N amount of vectors to keep. Do shuffle this for all size of model multiple times.
  3. Use a dimension reduction algorithm such as PCA as the last layer in a pipe. Again do it multiple times.