Hi everyone,

I’m currently trying to see if BERT performances get better on a binary classification task, after I did a fine-tuning on another task (regression).

My pipeline is:

- bert base → finetune on regression → fine tune on classification → test

vs - bert base → finetune on classification → test

For all the steps I’m using run_glue_no_trainer.py (with little modifications) found in transformers/examples/pytorch/text-classification.

I have a problem with loading the finetuned-with-regression-BERT weights in AutoModelForSequenceClassification. Looking on github issues I found the parameter “ignore_mismatched_sizes=True” that allows me to load weights without error, but then I get this runtime error when training:

huggingface RuntimeError: The size of tensor a (2) must match the size of tensor b (8) at non-singleton dimension 1

I assume that the problem is the parameter ignore_mismatched_sizes is for going from a classification head to another classification head with different classes, and so it loads the weights anyway (but the regression output for a sample is of size 1, and for binary classification is of size 2).

Given the fact I’m not interested in keeping my regression head, how can I take only “BERT” weights of my finetuned model on regression?

I tried:

config = BertConfig.from_pretrained(BERT_BASE)

model = BertModel.from_pretrained(REGRESSION-FINE-TUNED-BERT, config=config)

model.save_pretrained(…)

Using this model for initializing BertForSequenceClassification I don’t get errors but I get this warning I wasn’t expecting since I should have removed the regression/classification head:

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at /home/irene/similarity_eval/preprocessing/bert_similarity and are newly initialized: [‘classifier.weight’, ‘classifier.bias’]

Any advice?

Thanks in advance!!