Hello, I’ve been trying to train a BERT-based model for SQuAD with the Trainer API, and I would like to use TPUs from COLAB to speed-up the training.
I have followed the official notebook for training a SQuAD model, and then, I have followed the suggestions on this older topic:
Tpu Trainer
Which says the only thing to edit is to set a max_length parameter for the padding, which I already did.
So what I did is basically:
- Install the xla library
- Set max_length for padding
- Set TPU environment in COLAB
But I don’t get any speedup for training, am I missing something? This is my code:
My code
Edit: I made some progress! I have to restart the runtime as soon as I install the XLA library, otherwise there are compatibility issues with older versions.
But still, I didn’t get the speed-up I hoped for, if I do 0.8 iterations/second with a GPU, I only get 3.2 iterations/second with TPU, is it normal?
Also I get the following data:
Num examples = 66107
Num Epochs = 3
Instantaneous batch size per device = 16
Total train batch size (w. parallel, distributed & accumulation) = 16
Is it normal? To my understanding, each TPU should have 8 cores, so if I have a batch size of 16 per each devices, the total train batch size should be 128? How do I use all the cores? Thanks