Higher loss when resuming training from LLAMA 1B checkpoint

dhruvgrammarly · December 1, 2024, 1:08am

When loading the model from the checkpoint, the error I get is:

There were missing keys in the checkpoint model loaded: ['lm_head.weight'].

I’m using the model for Causal LM.

I checked and lm_head and input_embed are tied weights, so this should be fine in theory. However, when the fine-tuning is resumed, it starts off with a higher loss (~0.5) than where it left off (~0.42).

What could be wrong?

dhruvgrammarly · December 5, 2024, 6:29am

@nbroad helped solve this. I mentioned the details in this thread: Training Resumes with Increased Loss Despite Checkpoint Loading · Issue #33336 · huggingface/transformers · GitHub

Topic		Replies	Views
Resuming training: There were missing keys in the checkpoint model loaded: ['lm_head.weight'] Beginners	2	1726	December 1, 2024
Cannot Resume Training Beginners	1	1374	December 15, 2020
Different results from checkpoint evaluation when loading fine-tuned LLM model Intermediate	5	3236	September 22, 2023
Fine-tuning LLM for regression yields low loss during training but not in inference? 🤗Transformers	2	4489	March 4, 2024
Training Resumes with Increased Loss Despite Checkpoint Loading Beginners	0	89	September 5, 2024

Higher loss when resuming training from LLAMA 1B checkpoint

Related topics