I’m trying to further pre-train BERT model on my own very large dataset using the
run_mlm.py file in
transformers/examples/pytorch/language-modelling/. (sorry for not providing link - 2 link limit)
I’m using a Colab TPU to speed training, however training times seem to be incredibly long.
Here’s the code I’m using for training:
python /content/transformers/examples/legacy/seq2seq/xla_spawn.py --num_cores 8 /content/transformers/examples/pytorch/language-modeling/run_mlm.py \ --model_name_or_path bert-base-cased \ --validation_split_percentage 20 \ --line_by_line \ --do_train \ --do_eval \ --tpu_num_cores 8 \ --learning_rate 2e-5 \ --per_device_train_batch_size 64 \ --per_device_eval_batch_size 128 \ --num_train_epochs 4 \ --output_dir /content/output \ --train_file /content/text.txt \ --overwrite_output_dir
I installed torch, torch_xla and set
XRT_TPU_CONFIG according to this.
I installed transfomers from source as required.
That’s full output I’m getting.
As you can see estimated training time is 354 hours, which is ridiculously long. I’m guessing I’m not loading the TPU correctly and as a result training is running on CPU, however I’m not sure what’s the correct way to do that.
Any help is much appreciated. Thanks.