I am trying to finetune t5-large for summarization on xsum. I followed all the steps like:
- AdaFactor instead of AdamW,
- initialized all models and datasets in the global scope instead of _mp_fn.
I want to know what I am doing wrong. I am using Kaggle and 16gb of RAM should be sufficient for this according to T5 Finetuning tips.
Please guide me