How to use TPU for BERT training Colab

silkski · March 29, 2022, 5:05pm

Hi there,

I’m trying to further pre-train BERT model on my own very large dataset using the run_mlm.py file in transformers/examples/pytorch/language-modelling/. (sorry for not providing link - 2 link limit)

I’m using a Colab TPU to speed training, however training times seem to be incredibly long.

Here’s the code I’m using for training:

python /content/transformers/examples/legacy/seq2seq/xla_spawn.py --num_cores 8 /content/transformers/examples/pytorch/language-modeling/run_mlm.py \
--model_name_or_path bert-base-cased \
--validation_split_percentage 20 \
--line_by_line \
--do_train \
--do_eval \
--tpu_num_cores 8 \
--learning_rate 2e-5 \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 128 \
--num_train_epochs 4 \
--output_dir /content/output \
--train_file /content/text.txt \
--overwrite_output_dir

I installed torch, torch_xla and set XRT_TPU_CONFIG according to this.

I installed transfomers from source as required.

transformers version 4.18.0.dev0
pytorch version 1.8.2+cpu

That’s full output I’m getting.

As you can see estimated training time is 354 hours, which is ridiculously long. I’m guessing I’m not loading the TPU correctly and as a result training is running on CPU, however I’m not sure what’s the correct way to do that.

Any help is much appreciated. Thanks.

Enes3774 · July 30, 2022, 2:03pm

hi,did you solve problem. I have the same problem. Please answer

Topic		Replies	Views
Trainer with Google Colab TPU? Beginners	0	650	April 25, 2022
How to use TPU for model training using example script run_mlm.py Beginners	4	1369	June 18, 2022
Trainer with TPUs Beginners	3	2769	April 13, 2022
Tutorials for using Colab TPUs with Huggingface Transformers? 🤗Transformers	16	20598	June 3, 2024
🤗Transformer with Trainer API on TPU VMs and TPU Pods Beginners	0	408	December 18, 2023

How to use TPU for BERT training Colab

Related topics