cross posted: python - How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)? - Stack Overflow I’ve extensively look over the internet, hugging face’s (hf’s) discuss forum & repo but found no end to end example of how …

[image] brando: I’ve extensively look over the internet, hugging face’s (hf’s) discuss forum & repo but found no end to end example of how to properly do ddp/distributed data parallel with HF (links at the end). It’s actually very simple and straightforward to do this. The Trainer will automa…

How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

Intermediate

brando August 17, 2022, 7:22pm 7

ok this is the command:

torchrun --nproc_per_node 2 my_script.py

1 Like

Topic		Replies	Views
Distributed training large models on cloud resources Beginners	6	790	March 27, 2024
Single Node Multi GPU FlanT5 fine-tuning using HF Dataset and HF Trainer 🤗Transformers	4	2066	July 5, 2023
Using Transformers with DistributedDataParallel — any examples? Intermediate	11	23494	May 8, 2023
Multi gpu training 🤗Transformers	3	6029	April 24, 2022
Training using multiple GPUs Beginners	20	20147	February 25, 2024

How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

Related topics