Many transformer models, like BERT and GPT, perform well on large datasets, but what about fine-turning them on smaller and highly specialized datasets? how can one determine the optimal learning rate or batch size for such cases?
1 Like
Following the post with curiosity, great question.
is anybody there who can guide about it and he should also have some information about grooming a show cocker?
I’m there, but it’s too technical to answer…
It’s a subject that someone might write a paper or an article on.