T5ForConditionalGeneration checkpoint size mismatch #19418

msamogh · October 7, 2022, 7:49pm

I trained a T5ForConditionalGeneration model and saved the checkpoint using PyTorch Lightning’s Trainer to a .ckpt file. But when I try to load back the state_dict using model.from_state_dict(), I get this error:

RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
        Unexpected key(s) in state_dict: "decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight". 
        size mismatch for shared.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
        size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
        size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).
        size mismatch for lm_head.weight: copying a param with shape torch.Size([32103, 512]) from checkpoint, the shape in current model is torch.Size([32128, 512]).

I have not changed the model definition in any way. The keys also match. So, I’m really not sure how the sizes could mismatch magically when loading?

Loading the model

This is how I’m loading the model:

tokenizer = T5Tokenizer.from_pretrained(args["model_checkpoint"], bos_token="[bos]", eos_token="[eos]", sep_token="[sep]")
    model = T5ForConditionalGeneration.from_pretrained(args["model_checkpoint"], ignore_mismatched_sizes=True)
    model.load_state_dict({k[6:]: v for k, v in ckpt["state_dict"].items()})

I even tried to pass ignore_mismatched_sizes=True to the from_pretrained call, and that didn’t help either.

humbleyl · January 17, 2024, 1:06pm

It is the different version of torch that causes this problem.

Topic		Replies	Views
Problems when loading checkpoints 🤗Transformers	2	368	November 20, 2024
Size mismatch error in PEFT fine tuned model 🤗Transformers	4	1462	July 2, 2024
Saving a model and loading it Models	3	57967	July 5, 2024
After llama fine tuning, model merging fails Beginners	1	35	May 20, 2025
Not enough values to unpack (expected 2, got 1) when training with T5ForConditionalGeneration Beginners	0	1320	August 24, 2022

T5ForConditionalGeneration checkpoint size mismatch #19418

Loading the model

Related topics