When loading the model from the checkpoint, the error I get is:
There were missing keys in the checkpoint model loaded: ['lm_head.weight'].
I’m using the model for Causal LM.
I checked and lm_head
and input_embed
are tied weights, so this should be fine in theory. However, when the fine-tuning is resumed, it starts off with a higher loss (~0.5) than where it left off (~0.42).
What could be wrong?