How to use TPU for BERT training Colab

Hi there,

I’m trying to further pre-train BERT model on my own very large dataset using the run_mlm.py file in transformers/examples/pytorch/language-modelling/. (sorry for not providing link - 2 link limit)

I’m using a Colab TPU to speed training, however training times seem to be incredibly long.

Here’s the code I’m using for training:

python /content/transformers/examples/legacy/seq2seq/xla_spawn.py --num_cores 8 /content/transformers/examples/pytorch/language-modeling/run_mlm.py \
--model_name_or_path bert-base-cased \
--validation_split_percentage 20 \
--line_by_line \
--do_train \
--do_eval \
--tpu_num_cores 8 \
--learning_rate 2e-5 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 128 \
--num_train_epochs 4 \
--output_dir /content/output \
--train_file /content/text.txt \
--overwrite_output_dir

I installed torch, torch_xla and set XRT_TPU_CONFIG according to this.

I installed transfomers from source as required.

transformers version 4.18.0.dev0
pytorch version 1.8.2+cpu

That’s full output I’m getting.

As you can see estimated training time is 354 hours, which is ridiculously long. I’m guessing I’m not loading the TPU correctly and as a result training is running on CPU, however I’m not sure what’s the correct way to do that.

Any help is much appreciated. Thanks.

1 Like