Stucked on tokenization before training when using 3 GPU, but not when using 2 GPU

higopires · June 25, 2023, 7:59pm

I intend to use run_mlm.py to train RoBERTa from scratch. I have 3 A100 on my machine, so I entered the following command:

CUDA_VISIBLE_DEVICES=0,1,2 python run_mlm.py \
    --model_type roberta \
    --config_overrides="num_hidden_layers=6,max_position_embeddings=514" \
    --tokenizer_name MyModel \
    --train_file ./data/corpus_dedup.txt \
    --max_seq_length 512 \
    --line_by_line True \
    --per_device_train_batch_size 64 \
    --do_train \
    --overwrite_output_dir True \
    --gradient_accumulation_steps 4 \
    --num_train_epochs 40 \
    --fp16 True \
    --output_dir MyModel \
    --save_total_limit 1

When I try to do the training using a 3-GPU configuration, I’m getting stucked for dozens of hours in the tokenization before the training, with the following message:

You're using a RobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the callmethod is faster than using a method to encode the text followed by a call to thepad method to get a padded encoding.

Aditionally, when I try to do the training with only 2 GPU (CUDA_VISIBLE_DEVICES=0,1, followed by the same parameters), my training runs normally . What can be done about it? I really would like to use all the GPUs and have less training steps.

Topic		Replies	Views
Stucked on "Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding." Intermediate	0	2033	June 27, 2023
How can I get advantage using multi-GPUs Beginners	5	3140	February 3, 2021
Getting different sentence embeddings when using model on CPU and GPU Beginners	0	2298	August 26, 2022
RoBERTa training low GPU utilization 🤗Transformers	6	4014	July 3, 2021
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 11.17 GiB total capacity; 10.62 GiB already allocated; 145.81 MiB free; 10.66 GiB reserved in total by PyTorch) Beginners	8	27439	December 10, 2023

Stucked on tokenization before training when using 3 GPU, but not when using 2 GPU

Related topics