How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

muellerzr · August 17, 2022, 4:58pm

It’s actually very simple and straightforward to do this. The Trainer will automatically pick up the number of devices you want to use. Any and all of the examples in transformers/examples/pytorch are capable to be ran on multi-gpu automatically. HuggingFace fully supports all DDP

In my example I’ll use the text classification one.

To check if it’s using two GPU’s the whole time, I’ll start with watch -n0.1 nvidia-smi in a separate terminal.

Now I’m assuming we’re running this through a clone of the repo, so the args will be setup similar to how the tests are done:

torchrun --n-proc-per-node 2 examples/pytorch/text-classification/run_glue.py --model_name_or_path distilbert-base-uncased --output_dir outputs --train_file ./tests/fixtures/tests_samples/MRPC/train.csv --validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv --do_train --do_eval --per_device_train_batch_size=2 --per_device_eval_batch_size=1

And that’s all it takes, just launch it like normal via torchrun! You’ll see that both GPUs get utilized immediatly

If you wanted to avoid this, mix it in with our Accelerate library, run accelerate config, and then all you have to do is launch the script via:

accelerate launch examples/pytorch/text-classification/run_glue.py --model_name_or_path distilbert-base-uncased --output_dir outputs --train_file ./tests/fixtures/tests_samples/MRPC/train.csv --validation_file ./tests/fixtures/tests_samples/MRPC/dev.csv --do_train --do_eval --per_device_train_batch_size=2 --per_device_eval_batch_size=1

The config you set will wrap around all the complicated torchrun bits, so you don’t need to do all of that yourself. Even if you don’t use Accelerate for any actual training

Not entirely sure what Cherry is, but Accelerate is a lower-level library capable of doing DDP across single and multi node GPU and TPU, so any and all forms of DDP. But again, Trainer handles all the magic for you in that regard, you just need to launch it the right way

Topic		Replies	Views
Hugging Face and Distributed Training: DDP/DP Implementation Help Needed Intermediate	0	508	February 14, 2024
Trainer API for data parallel on multi-node 🤗Transformers	4	89	February 6, 2025
Multi gpu training 🤗Transformers	3	6013	April 24, 2022
Using Transformers with DistributedDataParallel — any examples? Intermediate	11	23111	May 8, 2023
How does DDP + huggingface Trainer handle input data? Intermediate	3	1028	May 18, 2023

How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

Related topics