I am working on a school project which is to classify news headlines. It’s a binary classification. I scraped the news headlines, used sklearn train_test_split to split them. Then used ktrain - distilBert to classify them. There is a learn rate finder function, I run that and get an abnormal learn rate curve as shown in below image:
while the normal learn rate should be somehow in a U-shape, falls gradually from a higher loss then up again.
What does that abnormal learn rate curve imply? Is it to do with overfitting or anything? I am really new to the transformer thing and there are not many resources on the internet so I try to ask here. Thanks.