I’m working with DistilBERT model. I created a few new vocab tokens and trained the model.
tokenizer.add_tokens(["[NEW_TOKEN]"], special_tokens=True)
model.resize_token_embeddings(len(tokenizer))
tokenizer.save_pretrained(args.tokenizer_dir)
Everything is working fine but I’m hitting some issues while loading the trained model from a check-point:
$ model = BERTMODEL.from_pretrained(checkpoint_path)
RuntimeError: Error(s) in loading state_dict for BERTMODEL:
size mismatch for vocab_projector.weight: copying a param with shape torch.Size([30522, 768]) from checkpoint, the shape in current model is torch.Size([30524, 768]).
Size mismatch 30522 vs 30524 because I added 2 new tokens in the vocab. I’m not sure how to pass the new vocab config when loading the model from a checkpoint:
model = BERTMODEL.from_pretrained(checkpoint_path) # <--- ????
Any hint regarding what is missing in my code?