How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

JulianGerhard · March 31, 2023, 8:37am

thanks for providing this useful statements. I am using the scenario:

python -m torchrun --nproc_per_node 2 train_xxx.py

which is basically derived from the nlp_example.py

All I actually changed is the tokenize function and the dataset. After starting the script execution,
the model gets downloaded and everything starts properly. nvidia-smi shows that both GPUs are at approx. 80% usage - so far so good.

What worries me now is the fact that the log outputs the things I have been outputting so far, e.g. the size of the dataset etc., twice:

2023-03-31 08:14:23.354 | DEBUG    | __main__:get_dataloaders:79 - DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 7500
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 2500
    })
})
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

My question would be how can I make sure that now not the training runs once on each GPU, but actually distributed?

Kind regards
Julian

Topic		Replies	Views
Distributed training large models on cloud resources Beginners	6	859	March 27, 2024
Single Node Multi GPU FlanT5 fine-tuning using HF Dataset and HF Trainer 🤗Transformers	4	2089	July 5, 2023
Using Transformers with DistributedDataParallel — any examples? Intermediate	11	23784	May 8, 2023
Multi gpu training 🤗Transformers	3	6080	April 24, 2022
Training using multiple GPUs Beginners	20	20272	February 25, 2024

How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

Related topics