I have a general question and appreciate your feedback on it.
I am new to transformers. My main problem is that it overfits so quickly, I am using regularization methods such as augmentation and dropout, but after 2 epochs my validation accuracy starts to drop while the training accuracy reaches to highest (basically my model overfit).
do you have any suggestions?
Interestingly I never see this behavior when I use convolutions…