When can we expect TPU Trainer?

moma1820 · September 28, 2021, 10:09am

Hi, wanted to know when can we expect, Trainer API to use TPU’s.

Can i implement it myself? Give me some tips to where to start from

Let me know,
Kind regards

nielsr · September 28, 2021, 11:04am

The Trainer API does support TPUs. For example, the language modeling examples can be run on TPU. There’s one thing to take into account when training on TPUs:

Note: On TPU, you should use the flag --pad_to_max_length in conjunction with the --line_by_line flag to make sure all your batches have the same length.

You can take a look at the scripts for details.

xhluca · December 23, 2021, 5:54am

Hi Nielsr,

I tried running the WNUT17 Trainer example in torch on both Kaggle and Colab TPUs, and neither seem to be working (although I made sure the TPUs were correctly configured and XLA was correctly installed).

Here’s the colab notebook: Google Colab

Here’s the Kaggle notebook: https://www.kaggle.com/xhlulu/huggingface-wnut17-tpu-tests?scriptVersionId=83062978

As you can see, each iteration takes significantly more time than it should be on GPU (total training time is ~1.5min on P100).

sgugger · December 23, 2021, 4:26pm

You are not padding your inputs and targets to a fixed size in this example, but dynamically padding them to the longest input/target in each batch. This cause the TPU to recompile at each step, so it’s normal you see a very long training time compared to GPUs.

To properly train on TPU, you need to apply fixed padding in tokenize_and_align_labels to a given length of your choice, and pad the labels to that same length.

phosseini · March 3, 2022, 3:05am

This was really helpful, thanks. Just one follow up on this, if we’re using a data collator who gets a tokenizer as its parameter (e.g., DataCollatorForLanguageModeling), if we set padding=True for the tokenizer and then pass it to the data collator, is it going to have the same effect? (I did this and I’m already seeing a speed-up in TPU training, but not sure if it’s really because of making padding in tokenizer true.)

@sgugger Also, two quick questions, and I appreciate your input:

Is there any way to speed up training on TPU with dynamic padding as well?
Can we use the pad_to_multiple_of in the data collator to make it compatible with the TPU instead of potentially changing the tokenization or the data collating process?

Topic		Replies	Views
Trainer with TPUs Beginners	3	2772	April 13, 2022
Set TPU device in Trainer Beginners	5	2608	October 15, 2024
How to use TPU for BERT training Colab Beginners	1	954	July 30, 2022
Struggle with training on TPU using 'accelerate' library 🤗Accelerate	3	1710	March 7, 2022
TPU trainer with multi-core Intermediate	5	2204	April 21, 2022

When can we expect TPU Trainer?

Related topics