T5 evaluation via Trainer `predict_with_generate` extremely slow on TPU?

dweb · October 3, 2021, 3:49pm

Here is a Colab notebook demonstrating the issue Google Colab

After the initial period of the XLA compilation, training proceeds quickly. When the evaluation rolls around at the end of an epoch it’s extremely slow. I assumed perhaps there was just another initial period of slowness, but after 25 minutes the evaluation time estimation says 6 hours. For reference, I completed a single evaluation period on a P100 in ~9 minutes.

I found an old notebook by @valhalla Google Colab where he says,

Second, for some reason which I couldn’t figure out, the .generate method is not working on TPU so will need to do prediction on CPU.

It’s unclear to me whether “not working” means not at all or whether he had the same issue.

Anyone run into this issue?

isspek · November 2, 2023, 1:27pm

+1 I have a similar issue when I fine-tune with GPU. The training takes no long, however predictions on development set takes too long.

Topic		Replies	Views
TPU slow finetuning T5-base Models	13	3049	June 17, 2022
T5 GPU Runtime Degradation 🤗Transformers	0	851	February 3, 2021
Trainer with TPUs Beginners	3	2763	April 13, 2022
Mt5 taking way too long to train Models	0	503	October 27, 2022
Slow training time in current version 🤗Transformers	0	261	May 14, 2023

T5 evaluation via Trainer `predict_with_generate` extremely slow on TPU?

Related topics