MobileBert unused weights after upgrading from transformers 3.5.1 to 4.15.0

Hi, I’m relatively new to the transformers library. I am trying to update my project dependencies from the old v3.5.1 to v4.15.0, but after I did that the MobileBERT pretrained model started showing warnings about unused weights. To narrow down the problem, I tried running the example code from the documentation, but it gave the same results:

>>> from transformers import MobileBertModel
>>> model = MobileBertModel.from_pretrained('google/mobilebert-uncased')
Some weights of the model checkpoint at google/mobilebert-uncased were not used when initializing MobileBertModel: ['cls.predictions.dense.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing MobileBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing MobileBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

And the warning has a point, as the model does not converge at all during fine-tuning. This wasn’t a problem in v3.5.1, what changed? And what do I need to change to adapt to this?

Thanks in advance.