How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

muellerzr · August 18, 2022, 2:48pm

Yes, quite so. E.g. its currently whats used for fastai’s entire distributed module now. If its PyTorch, it can be done with Accelerate.

The repo has a cv example using timm

Not right now, no. Since Trainer handles all the DDP Accelerate can do internally. But as mentioned earlier, you can still use accelerate to launch those scripts

Topic		Replies	Views
Distributed training large models on cloud resources Beginners	6	787	March 27, 2024
Single Node Multi GPU FlanT5 fine-tuning using HF Dataset and HF Trainer 🤗Transformers	4	2063	July 5, 2023
Using Transformers with DistributedDataParallel — any examples? Intermediate	11	23480	May 8, 2023
Multi gpu training 🤗Transformers	3	6028	April 24, 2022
Training using multiple GPUs Beginners	20	20145	February 25, 2024

How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

Related topics