Hugging Face Forums
How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?
Intermediate
brando
August 17, 2022, 7:22pm
7
ok this is the command:
torchrun --nproc_per_node 2 my_script.py
1 Like
show post in topic
Related topics
Topic
Replies
Views
Activity
Distributed training large models on cloud resources
Beginners
6
790
March 27, 2024
Single Node Multi GPU FlanT5 fine-tuning using HF Dataset and HF Trainer
🤗Transformers
4
2066
July 5, 2023
Using Transformers with DistributedDataParallel — any examples?
Intermediate
11
23494
May 8, 2023
Multi gpu training
🤗Transformers
3
6029
April 24, 2022
Training using multiple GPUs
Beginners
20
20147
February 25, 2024