Problems when loading checkpoints

I was distilling my student model (base model t5-small) based on a fine-tuned T5-xxl. Here is the config
student_model = AutoModelForSeq2SeqLM.from_pretrained( args.student_model_name_or_path, torch_dtype=torch.float32, device_map="auto", cache_dir=args.cache_dir, quantization_config=quantization_config, )
I saved the trained model using

output_dir = f"checkpoint"
student_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

But when I trying to load the ckp using

tokenizer = AutoTokenizer.from_pretrained(args.model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(
        "checkpoint"
        torch_dtype=torch.float32,
        device_map="auto",
        cache_dir=args.cache_dir,
        quantization_config=quantization_config
)

it said that "Some weights of the model checkpoint at checkpoints were not used when initializing T5ForConditionalGeneration: " and the output of the model is really a mess. I was trying to figure it out but have no clues now.

1 Like

you might have forgotten to change the parameters in the config or did something wrong in the distillation process.
the error that you’re having means that there is a mismatch between the parameters of the model and the checkpoint.
try checking the model state_dict and the checkpoint file and figure out where is the mismatch happening.
this link might also help you with the distillation process :

2 Likes

The problem was model.save_pretrained() can’t save weights properly if you set the quantization load_in_8bit.
Check this link if you need to save weights loaded in 8 bits.

1 Like