Loading trained model with new vocab

I’m working with DistilBERT model. I created a few new vocab tokens and trained the model.

        tokenizer.add_tokens(["[NEW_TOKEN]"], special_tokens=True)
        model.resize_token_embeddings(len(tokenizer))
        tokenizer.save_pretrained(args.tokenizer_dir)

Everything is working fine but I’m hitting some issues while loading the trained model from a check-point:

$ model = BERTMODEL.from_pretrained(checkpoint_path) 

RuntimeError: Error(s) in loading state_dict for BERTMODEL:
    size mismatch for vocab_projector.weight: copying a param with shape torch.Size([30522, 768]) from checkpoint, the shape in current model is torch.Size([30524, 768]).

Size mismatch 30522 vs 30524 because I added 2 new tokens in the vocab. I’m not sure how to pass the new vocab config when loading the model from a checkpoint:

model = BERTMODEL.from_pretrained(checkpoint_path) # <--- ????

Any hint regarding what is missing in my code?

I don’t know if this would help, but you could pass ignore_mismatched_sizes=True when loading model.

I got the same issue as you, adding some new tokens to the tokenizer,
with code:
model.resize_token_embeddings(len(tokenizer))
training and save the model is OK, but the problem happened when I try to load the model
the raising error is:
size mismatch for text_model.model.embed_tokens.weight: copying a param with shape torch.Size([65048, 2048]) from checkpoint, the shape in current model is torch.Size([65037, 2048]).

Anyone can help please?
Thanks a lot in advance!!