Higher loss when resuming training from LLAMA 1B checkpoint

When loading the model from the checkpoint, the error I get is:

There were missing keys in the checkpoint model loaded: ['lm_head.weight']. 

I’m using the model for Causal LM.

I checked and lm_head and input_embed are tied weights, so this should be fine in theory. However, when the fine-tuning is resumed, it starts off with a higher loss (~0.5) than where it left off (~0.42).

What could be wrong?

1 Like