I was distilling my student model (base model t5-small) based on a fine-tuned T5-xxl. Here is the config
student_model = AutoModelForSeq2SeqLM.from_pretrained( args.student_model_name_or_path, torch_dtype=torch.float32, device_map="auto", cache_dir=args.cache_dir, quantization_config=quantization_config, )
I saved the trained model using
output_dir = f"checkpoint"
student_model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
But when I trying to load the ckp using
tokenizer = AutoTokenizer.from_pretrained(args.model_path)
model = AutoModelForSeq2SeqLM.from_pretrained(
"checkpoint"
torch_dtype=torch.float32,
device_map="auto",
cache_dir=args.cache_dir,
quantization_config=quantization_config
)
it said that "Some weights of the model checkpoint at checkpoints were not used when initializing T5ForConditionalGeneration: " and the output of the model is really a mess. I was trying to figure it out but have no clues now.