T5 Finetuning Tips

Jung · August 12, 2020, 8:03am

Hi guys!
I just finish training T5-large on ELI5 on 270,000 exampels using TPU V2-8 on colab modified from @valhalla notebook! This is not really finetuning tips, but some tips to make T5-large trainable on TPU V2-8 .

T5-large is challenging to train on TPU V2-8 with Pytorch (for me)

I faced a lot of memory problem (even on Colab High-RAM instance), this notebook of Davide Libenzi - one of XLA authors suggested to declare large model outside _mp_fn (see his mx variable )
with T5-base , there is around 7 minutes overhead before training can start, for T5-large, this takes 1 hour overhead to me
with max_length = 128 (both input and target), I am able to set per_device_train_batch_size = 4 (so, global_batch_size = 4*8 = 32)
there is an issue that xm.save() causes memory error with large models like XLM-Roberta , it happen to T5-large too, so I have to ignore the default save_steps of Trainer by setting it to 1000000

Combine all these, took me around 1 day before I can make a trainable notebook, so hopefully these tricks can be useful to some of you guys too!

I would like to find time to make a TF2 version which should be more stable on TPU

More note

As @valhalla mentioned in his notebook, High-RAM instance is a must. Lately Kaggle notebook increased RAM to 16GB for TPUV3-8, but I could not the training to success (sadly since V3-8 should be 2x faster than V2-8)

Topic		Replies	Views
Finetuning T5 for a task Intermediate	21	6882	September 3, 2022
Finetuning T5 on translation task 🤗Transformers	0	488	September 10, 2021
Does task specific prefix matters for T5 fine-tuning? Beginners	9	7275	June 28, 2021
T5: Tips for finetuning on crossword clues (clue => answer) Models	1	626	October 14, 2020
Finetuning mT5 for specific language pair Models	0	136	October 17, 2024

T5 Finetuning Tips

Related topics