Mt5 taking way too long to train

luisarmando · October 27, 2022, 7:32pm

I am training a large dataset (118208 sentences), and it’s taking upwards of 3149h on Google Colab using a TPU.

I used the same dataset on a T5, which took 7 days. I assume the difference is due to the wordpieces it uses (while the T5 uses 32k, the mt5 uses 250k).

I have tried playing around with the learning rate (highest that I’ve tried being 4 and lowest being 2e-05) but nothing seems to help.

Does anybody have any ideas?

Topic		Replies	Views
T5 evaluation via Trainer `predict_with_generate` extremely slow on TPU? Beginners	1	776	November 2, 2023
Slow training time in current version 🤗Transformers	0	262	May 14, 2023
[Beginner] ClassificationModel Running out of Memory, long training Epochs 🤗Transformers	6	1803	January 4, 2021
TPU slow finetuning T5-base Models	13	3055	June 17, 2022
Fine tune LongT5 mdoel Models	4	918	December 15, 2022

Mt5 taking way too long to train

Related topics