Why does ignore_mismatched_sizes increase the number of TfAlbertMainLayer parameters?

If I load the model from pretrained without much in the way of configs, I get about 11 million parameters in the albert main layer. If I load it but change the problem type and set it to ignore mismatched sizes, the main layer has 222 million parameters. This seems strange to me, I thought changing the problem type would only effect the classifier?

model = TFAlbertForSequenceClassification.from_pretrained('albert-base-v2', config=AlbertConfig(problem_type="single_label_classification"), ignore_mismatched_sizes=True)

Hi! The problem here is that your config object has default layer numbers and sizes that are totally different from the ones in albert-base-v2. If you’d like to train a sequence classification model on top of Albert, you can just do:

model = TFAlbertForSequenceClassification.from_pretrained('albert-base-v2', num_labels=2)

Alternatively, if you want to use a config object, you should initialize it from albert-base-v2 like this:

model = TFAlbertForSequenceClassification.from_pretrained('albert-base-v2', config=AlbertConfig.from_pretrained('albert-base-v2', problem_type="single_label_classification"))
2 Likes