I trained a T5ForConditionalGeneration
model and saved the checkpoint using PyTorch Lightning’s Trainer to a .ckpt
file. But when I try to load back the state_dict using model.from_state_dict()
, I get this error:
RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
Unexpected key(s) in state_dict: "decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight".
size mismatch for shared.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
I have not changed the model definition in any way. The keys also match. So, I’m really not sure how the sizes could mismatch magically when loading?
Loading the model
This is how I’m loading the model:
tokenizer = T5Tokenizer.from_pretrained(args["model_checkpoint"], bos_token="[bos]", eos_token="[eos]", sep_token="[sep]")
model = T5ForConditionalGeneration.from_pretrained(args["model_checkpoint"], ignore_mismatched_sizes=True)
model.load_state_dict({k[6:]: v for k, v in ckpt["state_dict"].items()})
I even tried to pass ignore_mismatched_sizes=True
to the from_pretrained
call, and that didn’t help either.