Hi, I am new to distributed training and am using huggingface to train large models. I see many options to run distributed training. Can I know what is the difference between the following options:
-
python train.py .....<ARGS>
- python -m torch.distributed.launch <ARGS>
deepspeed train.py <ARGS>
hf accelerate
I did not expect option 1 to use distributed training. But it even seem to use some sort of torch distributed training? In that case, whats the difference between option 1 and option 2?
Does deepspeed use torch.distributed in the background?
Also, huggingface by default seem to use distributed training using torch? Whats the difference between accelerate?