When can we expect TPU Trainer?

sgugger · December 23, 2021, 4:26pm

You are not padding your inputs and targets to a fixed size in this example, but dynamically padding them to the longest input/target in each batch. This cause the TPU to recompile at each step, so it’s normal you see a very long training time compared to GPUs.

To properly train on TPU, you need to apply fixed padding in tokenize_and_align_labels to a given length of your choice, and pad the labels to that same length.

Topic		Replies	Views
Trainer with TPUs Beginners	3	2772	April 13, 2022
Set TPU device in Trainer Beginners	5	2608	October 15, 2024
How to use TPU for BERT training Colab Beginners	1	954	July 30, 2022
Struggle with training on TPU using 'accelerate' library 🤗Accelerate	3	1710	March 7, 2022
TPU trainer with multi-core Intermediate	5	2204	April 21, 2022

When can we expect TPU Trainer?

Related topics