TPU slow finetuning T5-base

GenV · February 11, 2022, 3:27pm

Hi,
I am trying to train a T5-base on Colab with the TPU. I am using the official code to perform a fine-tuning on the T5-base (with my dataset), but the training with TPU is extremely slow! I’m using the offical code.

I am attaching the colab code with the various libraries I have installed: notebook.

Also, if I try to increase the batch size as >= 64, I get a memory error, as there seems to be only about 8 Gb available.

Can someone help me? Thank you!

finiteautomata · February 16, 2022, 1:35pm

Could you please post how you run this code? I mean, do you use xla_spawn.py, are you running it inline on a notebook?

GenV · February 16, 2022, 1:37pm

@finiteautomata I’m running it on Google colab, online with Google Colab PRO

finiteautomata · February 16, 2022, 1:54pm

@GenV sorry, I didn’t see your notebook link. You are missing the xla_spawn.py part, that is, the code that makes your code run in a parallel fashion. You should add this:

python xla_spawn.py --num_cores 8  t5.py \
    --model_name_or_path="t5-base" \
    --do_train \
    --do_eval \

etc etc

GenV · February 16, 2022, 2:05pm

@finiteautomata so I need this code and run

python xla_spawn.py --num_cores 8 t5.py \ and all other args of t5.py ?

finiteautomata · February 16, 2022, 2:08pm

Exactly that. Try it and tell what happens

GenV · February 16, 2022, 3:36pm

@finiteautomata I don’t know if it’s doing all well, I have this prints:

1- Running tokenizer on train dataset: 0% 0/30 [00:00<?, ?ba/s]WARNING:t5:Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Where xla:0 e not 1, but maybe it’s for the tokenizer run

2- huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Now it’s in finetuning on step 1.

GenV · February 16, 2022, 4:25pm

@finiteautomata I see that at start xla = 1, then xla = 0. Furthermore, the training seems to take a long time, about 90 hours (on gpu they become less than 2); maybe somehow it uses the CPU?

phosseini · March 2, 2022, 11:29pm

@GenV I have the same problem when using the TPU in Colab (I have Google Colab Pro +). I was not using the xla_spawn.py, I gave it a try and interestingly, the first time I did run my script using the xla_spawn.py it made my training faster, however, after terminating my node and reconnecting to the TPU, I cannot make it work again and even using the xla_spawn.py the training is very slow (so it was kind of random and I can’t reproduce).

Did you figure something out?

phosseini · March 3, 2022, 3:10am

For folks who are still struggling, I think I found one potential reason for why training on TPU is slow, look here (I set padding to True in my tokenizer and I’m already seeing a speedup in my TPU training, basically it looks like it didn’t have anything to do with my torch xla installation)

GenV · March 3, 2022, 2:19pm

Thank you @phosseini for your answer! Yes I had the same issue with the randomness. I have read the discussion and it is interesting, so it will be for padding = True (I was using False if I’m not mistaken).

Also I have your same question (if you just need to put padding = False), I wait for answer in the other thread.

GenV · March 8, 2022, 9:38am

@phosseini I tried with pad_to_max_length=True and it’s working fine.

deathcrush · June 16, 2022, 8:52am

Did you guys notice speedups vs GPU training? @GenV I have paid access to A100 GPUs but for side research tasks I’d like to use TPUs in case something works out…

GenV · June 17, 2022, 7:36am

@deathcrush The tpu is much faster VS gpu training with a A100, P100, V100 .

Topic		Replies	Views
T5 evaluation via Trainer `predict_with_generate` extremely slow on TPU? Beginners	1	776	November 2, 2023
TPU trainer with multi-core Intermediate	5	2197	April 21, 2022
Trainer with TPUs Beginners	3	2763	April 13, 2022
How to fine-tune T5-base model? Beginners	10	4584	July 28, 2021
Tutorials for using Colab TPUs with Huggingface Transformers? 🤗Transformers	16	20568	June 3, 2024

TPU slow finetuning T5-base

Related topics