Trainer with TPUs

Hello, I’ve been trying to train a BERT-based model for SQuAD with the Trainer API, and I would like to use TPUs from COLAB to speed-up the training.
I have followed the official notebook for training a SQuAD model, and then, I have followed the suggestions on this older topic:
Tpu Trainer
Which says the only thing to edit is to set a max_length parameter for the padding, which I already did.

So what I did is basically:

  • Install the xla library
  • Set max_length for padding
  • Set TPU environment in COLAB
    But I don’t get any speedup for training, am I missing something? This is my code:
    My code

Edit: I made some progress! I have to restart the runtime as soon as I install the XLA library, otherwise there are compatibility issues with older versions.
But still, I didn’t get the speed-up I hoped for, if I do 0.8 iterations/second with a GPU, I only get 3.2 iterations/second with TPU, is it normal?
Also I get the following data:

Num examples = 66107
Num Epochs = 3
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16

Is it normal? To my understanding, each TPU should have 8 cores, so if I have a batch size of 16 per each devices, the total train batch size should be 128? How do I use all the cores? Thanks

Looking at the source code, it might be that you need to specify the tpu_num_cores, assuming that the notebook/script in your case does not set this automatically.

Thank you for the answer Bram! Unfortunately that’s not the case, I think by default the parameter is set to 8, since even tuning it manually doesn’t change anything.

In that case, I am not sure if there are any other parameters that may improve your speed more. Perhaps someone else can help!