T5ForConditionalGeneration checkpoint size mismatch #19418

I trained a T5ForConditionalGeneration model and saved the checkpoint using PyTorch Lightning’s Trainer to a .ckpt file. But when I try to load back the state_dict using model.from_state_dict(), I get this error:

RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
        Unexpected key(s) in state_dict: "decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight". 
        size mismatch for shared.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
        size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
        size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
        size mismatch for lm_head.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).

I have not changed the model definition in any way. The keys also match. So, I’m really not sure how the sizes could mismatch magically when loading?

Loading the model

This is how I’m loading the model:

tokenizer = T5Tokenizer.from_pretrained(args["model_checkpoint"], bos_token="[bos]", eos_token="[eos]", sep_token="[sep]")
    model = T5ForConditionalGeneration.from_pretrained(args["model_checkpoint"], ignore_mismatched_sizes=True)
    model.load_state_dict({k[6:]: v for k, v in ckpt["state_dict"].items()})

I even tried to pass ignore_mismatched_sizes=True to the from_pretrained call, and that didn’t help either.

It is the different version of torch that causes this problem.