Why the checkpoint of old version of BERT can not be used for BERT with new version?

Hi,

I try to train a bert-base-uncased model on MNLI dataset. I download a checkpoint(old version of BERT, maybe v2.) which is finetuned on MNLI from “ishan/bert-base-uncased-mnli · Hugging Face”. However, when I use v4. BERT to load the checkpoint, its performance is very low.


I load the checkpoint using the following code:

model = AutoModelForSequenceClassification.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(“.ckpt” in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
)
model.load_state_dict(torch.load(“./checkpoint/pytorch_model.bin”), strict=False)