Hi,
I am trying to train a T5-base on Colab with the TPU. I am using the official code to perform a fine-tuning on the T5-base (with my dataset), but the training with TPU is extremely slow! I’m using the offical code.
I am attaching the colab code with the various libraries I have installed: notebook.
Also, if I try to increase the batch size as >= 64, I get a memory error, as there seems to be only about 8 Gb available.
@GenV sorry, I didn’t see your notebook link. You are missing the xla_spawn.py part, that is, the code that makes your code run in a parallel fashion. You should add this:
@finiteautomata I don’t know if it’s doing all well, I have this prints:
1- Running tokenizer on train dataset: 0% 0/30 [00:00<?, ?ba/s]WARNING:t5:Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Where xla:0 e not 1, but maybe it’s for the tokenizer run
2- huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks…
To disable this warning, you can either:
- Avoid using tokenizers before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
@finiteautomata I see that at start xla = 1, then xla = 0. Furthermore, the training seems to take a long time, about 90 hours (on gpu they become less than 2); maybe somehow it uses the CPU?
@GenV I have the same problem when using the TPU in Colab (I have Google Colab Pro +). I was not using the xla_spawn.py, I gave it a try and interestingly, the first time I did run my script using the xla_spawn.py it made my training faster, however, after terminating my node and reconnecting to the TPU, I cannot make it work again and even using the xla_spawn.py the training is very slow (so it was kind of random and I can’t reproduce).
For folks who are still struggling, I think I found one potential reason for why training on TPU is slow, look here (I set padding to True in my tokenizer and I’m already seeing a speedup in my TPU training, basically it looks like it didn’t have anything to do with my torch xla installation)
Thank you @phosseini for your answer! Yes I had the same issue with the randomness. I have read the discussion and it is interesting, so it will be for padding = True (I was using False if I’m not mistaken).
Also I have your same question (if you just need to put padding = False), I wait for answer in the other thread.
Did you guys notice speedups vs GPU training? @GenV I have paid access to A100 GPUs but for side research tasks I’d like to use TPUs in case something works out…