GPT model loss diverges while fine-tuning

entropy · February 11, 2022, 8:55pm

I’m trying to fine-tune the distilled GPT model on a new dataset and I’m having issues with the loss diverging during training. Can anyone thing of why this might be happening? I’ve lowered the learning rate to 1e-7 which feels extremely low, and the issue persists

Topic		Replies	Views
GPT loss increasing Models	1	1896	June 21, 2021
Not getting a good model at first try Models	0	362	April 14, 2022
Any tutorials for distilling (e.g. GPT2)? Beginners	1	649	August 29, 2021
Finetuing GPT model? 🤗Transformers	2	353	August 29, 2021
Loss Issues on Finetuning Beginners	0	313	February 22, 2024

GPT model loss diverges while fine-tuning

Related topics