T5-base results are worse than t5-small

Hi everyone,

I pretrained T5 small, base and large on the PrivaSeer corpus with a spanned MLM objective. I called the pretrained model PrivaT5. Then finetuned PrivaT5 and T5 small, base and large on some tasks of the PrivacyGLUE benchmark. You can see the results in these plots:

For all model sizes I used the same hyperparameters except for the batch size I changed it to make the model fit on the TPU. Example :


Could anyone give me possible reasons why the PrivaT5 base performance unexpectedly drops on the OPP-115 and Policy-Detection tasks compared to PrivaT5 small? (Multilabel text classification & binary text classification respectively).

Thank you!

