Difference between accelerate/torch_distributed/deepspeed

arunwzd · April 25, 2022, 6:28pm

Hi, I am new to distributed training and am using huggingface to train large models. I see many options to run distributed training. Can I know what is the difference between the following options:

python train.py .....<ARGS> -
python -m torch.distributed.launch <ARGS>
deepspeed train.py <ARGS>
hf accelerate

I did not expect option 1 to use distributed training. But it even seem to use some sort of torch distributed training? In that case, whats the difference between option 1 and option 2?

Does deepspeed use torch.distributed in the background?

Also, huggingface by default seem to use distributed training using torch? Whats the difference between accelerate?

Topic		Replies	Views
Deepspeed script launcher vs accelerate script launcher for TRL DeepSpeed	0	368	December 25, 2023
Using deepspeed script launcher vs accelerate script launcher for TRL 🤗Accelerate	4	1911	January 24, 2024
Difference between using the Trainer class vs Accelerate library DeepSpeed	0	905	June 27, 2023
Best practice to run DeepSpeed DeepSpeed	2	1560	December 25, 2023
Exact difference between Transformers' and Accelerate's DeepSpeed integrations? DeepSpeed	5	817	February 13, 2024

Difference between accelerate/torch_distributed/deepspeed

Related topics