Finetuning BERT on TPU is very slow

NightMachinery · August 11, 2022, 11:45am

I followed the HuggingFace course, and decided to try finetuning BERT on a random classification dataset. Everything works fine, except that the training is extremely slow on TPUs. (I have used padding='max_length', too.)

Can someone tell me if this is a bug or some mistake on my part? Most of the code has been copied from the course docs though.

Here is the notebook:

Topic		Replies	Views
How to fine-tune BERT on 1 million+ sentences on Kaggle? (Sequence Regression) Beginners	0	322	December 24, 2022
Trainer with TPUs Beginners	3	2764	April 13, 2022
How to use TPU for BERT training Colab Beginners	1	953	July 30, 2022
Difference of performance when finetuning bert use the huggingface or the google official code 🤗Transformers	0	446	June 20, 2022
BERT performs worse than other implementations? 🤗Transformers	0	779	July 24, 2020

Finetuning BERT on TPU is very slow

Related topics