Using deepspeed script launcher vs accelerate script launcher for TRL

SantoshScienceIO · December 28, 2023, 5:07pm

I’ve been trying to figure out the nature of the deepspeed integration, especially with respect to huggingface accelerate.

It seems that the trainer uses accelerate to facilitate deepspeed. But when I look at the documentation, it seems that we still use deepspeed as the launcher, or the pytorch distribute

deepspeed --num_gpus=2 your_program.py <normal cl args> --deepspeed ds_config.json

or

python -m torch.distributed.launch --nproc_per_node=2 your_program.py <normal cl args>

But it didn’t mention using the accelerate launcher. I’m confused by this since trainer uses accelerate to facilitate the deepspeed integration.

As for the TRL library, it seems that it’s using the accelerate library for it’s trainers as well, but for the TRL library, the official way to launch it is to use the accelerate launcher

accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero{1,2,3}.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script

I’m wondering if it’s still exchangeable with the deepspeed launcher, of it not, what’s the nature of the facilitation.

marcsun13 · January 24, 2024, 3:53pm

It seems that the trainer uses accelerate to facilitate deepspeed. But when I look at the documentation, it seems that we still use deepspeed as the launcher, or the pytorch distribute

deepspeed --num_gpus=2 your_program.py <normal cl args> --deepspeed ds_config.json

or

python -m torch.distributed.launch --nproc_per_node=2 your_program.py <normal cl args>

You can still launch the script with the deepspeed and pytorch distribute command. However, since the trainer refacto, the trainer backend relies totally on accelerate. The accelerate launch command is there to simplify your life since you only need to remember one command + easy to set up the parameters using accelerate config.


But it didn’t mention using the accelerate launcher. I’m confused by this since trainer uses accelerate to facilitate the deepspeed integration.

Check the doc here! But I agree with you, this section is well hidden .

As for the TRL library, it seems that it’s using the accelerate library for it’s trainers as well, but for the TRL library, the official way to launch it is to use the accelerate launcher

TRL Trainer relies on transformers Trainer. This is why you can use accelerate there as well. We should probably update more our doc to emphasize on accelerate indeed.

I’m wondering if it’s still exchangeable with the deepspeed launcher, of it not, what’s the nature of the facilitation.

Under the hood, accelerate launch uses the deepspeed launcher. Same for pytorch distributed.

SantoshScienceIO · January 24, 2024, 7:57pm

Thanks for the detailed explanation

Under the hood, accelerate launch uses the deepspeed launcher. Same for pytorch distributed.

Ah, that clears up a lot for me. For ‘Same for pytorch distributed.’, does that mean accelerate launch uses pytorch distributed? Or pytorch distributed uses the deepspeed launcher?

marcsun13 · January 24, 2024, 8:34pm

Yes, you can see it in this line here. Also see pytorch doc.

system · January 25, 2024, 8:56am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deepspeed script launcher vs accelerate script launcher for TRL DeepSpeed	0	370	December 25, 2023
Difference between using the Trainer class vs Accelerate library DeepSpeed	0	919	June 27, 2023
Difference between accelerate/torch_distributed/deepspeed DeepSpeed	0	1431	April 25, 2022
Exact difference between Transformers' and Accelerate's DeepSpeed integrations? DeepSpeed	5	842	February 13, 2024
Besides writing your own training loop, is there any other advantage for using it with deepspeed? 🤗Accelerate	2	601	July 4, 2023

Using deepspeed script launcher vs accelerate script launcher for TRL

Related topics