Loss becoming nearly zero in first 5K steps when training LM from scratch

I am training the ALBERT LM model from scratch.

I have already trained it for Hindi and Bangla and it was working fine but when I am training on Gujarati and Telugu, the loss is becoming zero in 5K steps.

What could be the reason for the sudden drop in the loss? Can anyone suggest what could be the cause or how to debug such issue?
Any suggestions?

Usually, this means you are training on your validation data, so I’d triple-check your training set and validation set don’t contain the same texts, or that there is no leak from one to the other.

I am not using any validation data. I have masked the raw corpus after sentencpiece tokenization and put it to training.

Then it’s just the model learning your training set. If you really want to know how it would fare on new data, you need to use a validation set and compute the loss on it.

Have used eval set for hindi, And it was taking lot of time to for evaluation so I went without eval data.

But without eval data you don’t know how your model actually perform. You may be overfitting the training dataset

But without eval data you don’t know how your model actually performs. You may be overfittumg

agreed. But what can be cause of sudden drop in loss?