I followed the HuggingFace course, and decided to try finetuning BERT on a random classification dataset. Everything works fine, except that the training is extremely slow on TPUs. (I have used padding='max_length'
, too.)
Can someone tell me if this is a bug or some mistake on my part? Most of the code has been copied from the course docs though.
Here is the notebook: