my net (a multimodal transformer for text and vision) vastly overfitted the validation set after 4 epochs. I added all kinds of combinations of different dropout strengths, weight decays, learning rate schedulings, and so on…
the only thing that changed is the amount of epochs until the train and eval scores converge to the same values. example:
without any regularization, best scores: epoch 4: train roc auc 99%, val roc auc: 64%
with 40% dropout, weight decay, best validation scores along all epochs: epoch 30: train roc auc 99%, val roc auc: 64%
This is just one example. I tried many different regularization strengths and i always end up having such bad scores on the validation data. The only thing that varies is the amount of epochs to reach that.
the baseline method achieves validation scores of 80% roc auc…so there is definitely space for improvement…
What am I doing wrong?
Do you have any suggestions?