Finetuning BERT on TPU is very slow

I followed the HuggingFace course, and decided to try finetuning BERT on a random classification dataset. Everything works fine, except that the training is extremely slow on TPUs. (I have used padding='max_length', too.)

Can someone tell me if this is a bug or some mistake on my part? Most of the code has been copied from the course docs though.

Here is the notebook: