I am training GPT-J with DeepSpeed. The training loss kept decreasing, which is good, but the validation loss starts climbing as early as 2 epochs. However, the oddest thing is that the training loss drop and validation loss jump seem synchronized, and always happen at the boundary of each epoch. It doesn’t look right, but I have no clue what’s the cause. Does anyone have a suggestion what to investigate? I tried to add 10 steps of warmup or reducing learning rate from the default 5e-5 to 2e-2. But no changes. I always see such “stair” shape learning curve.
1 Like
Hi @dunalduck0 are you using TensorFlow or PyTorch out of curiosity?
I know this post is old, but did you manage to find an answer in the end? I’m facing a similar issue
Increasing the dataset or reducing the learning rate or turning off the train shuffle can alleviate or eliminate this phenomenon.
Thanks for you advice.Turn off the shuffle for train dataloader did work for me.But I wonder why it works.Maybe turning off shuffle will provides better data distribution? I used to conclude this loss drop as memorization till I turn off shuffle as you advised.It is really confused to me.
1 Like
Thanks - turning off train shuffle solved it for me also.