GPT loss increasing

Hi! I’m finetuning a GPT-2 model (360M params = size M) on a huge training set (dozens of GB) with a low learning rate (15e-6, or 0.000015). The loss value dips at the start, and then gradually increases for a long time. I haven’t seen it start to fall again yet. Is this behavior normal/expected?

hey @treeofknowledge this plot suggests your learning rate is too high after 25k training steps so you could try using a learning rate scheduler like the default provided in transformers.Trainer (see here)

1 Like