Loss becoming nearly zero in first 5K steps when training LM from scratch

008karan · July 30, 2020, 10:08am

I am training the ALBERT LM model from scratch.

I have already trained it for Hindi and Bangla and it was working fine but when I am training on Gujarati and Telugu, the loss is becoming zero in 5K steps.

What could be the reason for the sudden drop in the loss? Can anyone suggest what could be the cause or how to debug such issue?
Any suggestions?

sgugger · July 30, 2020, 1:45pm

Usually, this means you are training on your validation data, so I’d triple-check your training set and validation set don’t contain the same texts, or that there is no leak from one to the other.

008karan · July 30, 2020, 1:53pm

I am not using any validation data. I have masked the raw corpus after sentencpiece tokenization and put it to training.

sgugger · July 30, 2020, 1:54pm

Then it’s just the model learning your training set. If you really want to know how it would fare on new data, you need to use a validation set and compute the loss on it.

008karan · July 30, 2020, 2:04pm

Have used eval set for hindi, And it was taking lot of time to for evaluation so I went without eval data.

mrm8488 · July 31, 2020, 10:59am

But without eval data you don’t know how your model actually perform. You may be overfitting the training dataset

mrm8488 · July 31, 2020, 6:02pm

But without eval data you don’t know how your model actually performs. You may be overfittumg

008karan · August 1, 2020, 6:36am

agreed. But what can be cause of sudden drop in loss?

dinesh · October 16, 2020, 5:13am

hi, have you resolved your issue. I have a similar problem

GenV · April 1, 2022, 8:17am

@008karan @dinesh I have a similar problem (here the post). Have you resolved your issue?

@sgugger I’m using validation set too, but train loss becames zero in 4 epochs.I tried to continue the training with others 4 epochs with lower learning rate, but the train loss starts at zero, so I have no improvement.

rebooorn · March 18, 2023, 10:27am

Hi, I encounter a similar problem.
Have you guys fixed it? Any help?

Topic		Replies	Views
Loss to zero in the training Models	0	2170	February 17, 2022
Continuing training masked LM: loss going up, performance going down 🤗Transformers	0	732	December 9, 2022
Loss Issues on Finetuning Beginners	0	310	February 22, 2024
Traing loss decreases but dev accuracy gives zero Beginners	0	362	January 10, 2023
Albert pre-train convergence problem 🤗Transformers	1	631	December 6, 2024

Loss becoming nearly zero in first 5K steps when training LM from scratch

Related topics