Abnormal learn rate curve

cmlui · January 1, 2021, 2:01pm

I am working on a school project which is to classify news headlines. It’s a binary classification. I scraped the news headlines, used sklearn train_test_split to split them. Then used ktrain - distilBert to classify them. There is a learn rate finder function, I run that and get an abnormal learn rate curve as shown in below image:

while the normal learn rate should be somehow in a U-shape, falls gradually from a higher loss then up again.

What does that abnormal learn rate curve imply? Is it to do with overfitting or anything? I am really new to the transformer thing and there are not many resources on the internet so I try to ask here. Thanks.

Jung · January 2, 2021, 9:04am

Hi, not sure if I understand correctly

Which “learn rate finder function” you use ? ( just curious as I am familiar of the idea coming from fast.ai team )
spike in 10^-1 looks plausible to me since it’s too much big learning rate
Did you initialize your model from some checkpoints ? If yes, maybe the model is already good and have small loss in the beginning, so small loss with LR=10^-7 is also plausible

cmlui · January 2, 2021, 11:02am

I just follow this step by step tutorial of ktrain , it is a light weight wrapper of huggingface transformer. It has the learn rate finder function. I am not sure about the initializing from checkpoint part, does that mean a pretrained model? As far as I know, the tutorial is using the pretrained distilBert from Huggingface.

cmlui · January 2, 2021, 11:06am

Also, the learn rate curve demonstrated in the tutorial is like this:

That’s why I think mine is not a normal one.

Jung · January 2, 2021, 11:56am

Yes, Pretrained model (= checkpoint i mentioned above) can be one reason. Maybe you can try remove the pretrained (init from scratch) and plot the graph again

BramVanroy · January 2, 2021, 4:17pm

Note that the learning rate curve that you post seems to be not the actual lr that is used during training, but a utility function that tests out different lr’s to see how that influences the loss function. Its goal is to help you find a good starting lr. In other words, the graph is not the learning rate changing over time, but the loss over different a different lr parameter. In that sense, it is very normal that that curve is different between checkpoints, architectures, and even datasets.

cmlui · January 2, 2021, 9:20pm

Cool to know these, thanks both!

Topic		Replies	Views
How to check or manually control the learning rate used in training? 🤗Transformers	1	8024	May 6, 2022
How do I create learning curves? Beginners	0	482	April 21, 2023
Debugging a model that isn't learning Beginners	2	607	September 20, 2020
Loss behaviour for bert fine-tuning on QNLI Models	3	4414	October 15, 2021
Trainer: How to find the best learning rate? Beginners	0	1136	February 23, 2023

Abnormal learn rate curve

Related topics