Loss becoming nearly zero in first 5K steps when training LM from scratch

@008karan @dinesh I have a similar problem (here the post). Have you resolved your issue?

@sgugger I’m using validation set too, but train loss becames zero in 4 epochs.I tried to continue the training with others 4 epochs with lower learning rate, but the train loss starts at zero, so I have no improvement.