Is there a way to correctly load a pre-trained transformers model without the configuration file?

You are absolutely correct, checkpoint also includes the states of other things. I hadn’t noticed this! I have checked the keys with the code below:

MODEL_PATH = "./aerobert/phase2_ckpt_4302592.pt"
keys = torch.load(MODEL_PATH).keys()
keys

Output: dict_keys([‘model’, ‘optimizer’, ‘master params’, ‘files’])

If I look at the the files, there are quite a few files as below:

[3,
‘/local_workspace_data/bert/part-00879-of-00500.hdf5’,
‘/local_workspace_data/bert/part-00562-of-00500.hdf5’,
‘/local_workspace_data/bert/part-01703-of-00500.hdf5’,
‘/local_workspace_data/bert/part-01706-of-00500.hdf5’,
…]

If I run your code below, it produces an error:

MODEL_PATH = "./checkpoint.pt"
state_dict = torch.load(MODEL_PATH)["model"]
config = AutoConfig.from_pretrained("./bert_config.json")
model = BertModel(config)

model = BertModel._load_state_dict_into_model(
    model,
    state_dict,
    MODEL_PATH
)[0]

The error:

RuntimeError: Error(s) in loading state_dict for BertModel:
size mismatch for bert.embeddings.word_embeddings.weight: copying a param with shape torch.Size([30528, 1024]) from checkpoint, the shape in current model is torch.Size([30522, 1024]).

Does this mean the vocabulary of the saved model has 6 additional words?