12% into epoch training loss drops to 0.0

ran-w · March 2, 2024, 11:58pm

I noticed a similar problem but was running on much smaller model and dataset, where I was able to run for long enough after seeing 0.0 loss in a very short period of time.

In my experiments, this was because there are nan values in your model (and maybe some parts of the loss, weird for some reason). So hf handles it by first outputting a sequence of zeros, and after a while, they became nan loss. I should be able to see this if you are tracking your gradients.

Curious if you had made any progress on this?

See my issue here: TRL SFT super prone to nan when using data collator

Topic		Replies	Views
Sudden Loss Drop and Poor Performance During Model Training Intermediate	0	61	April 28, 2025
Question about loss computing in training masked-language-model Beginners	0	327	March 17, 2022
Mistral-7B-v0.1 finetuning results in Out-of-Memory after some iterations Models	2	1195	January 19, 2024
8 bit precision error Models	0	408	March 30, 2024
Why is Wav2Vec pretraining loss not decreasing? Models	10	2645	April 29, 2022

12% into epoch training loss drops to 0.0

Related topics