Exact difference between Transformers' and Accelerate's DeepSpeed integrations?

jkkim · December 23, 2023, 9:17pm

Hello,

I’m trying to use DeepSpeed with Transformers, and I see there are two DeepSpeed integrations documented on HF:
(a) Transformers’ DeepSpeed integration: DeepSpeed Integration
(b) Accelerate’s DeepSpeed integration: DeepSpeed

However, I’m a bit confused by these two.
They have separate documentations, but are they really two completely separate integrations?

After examining the codes, I’ve realized that Accelerator._prepare_deepspeed calls deepspeed.initialize while there is no calling of deepspeed.initialize on Transformers’ end, which seems to be contradicting with documentation (a) as “Trainer Deepspeed Integration” shouldn’t require the user to call deepspeed.initialize manually.

Could someone clarify it for me, please?

Thank you!

muellerzr · December 23, 2023, 9:34pm

The Trainer uses Accelerate under the hood so it’s called when the Trainer calls accelerator.prepare()

jkkim · December 23, 2023, 9:45pm

Thank you for your response.

Does that mean Transformers’ DeepSpeed integration relies on Accelerate’s DeepSpeed integration?

muellerzr · December 23, 2023, 9:46pm

Yes indeed

SantoshScienceIO · December 25, 2023, 1:59am

so in an example where the script includes

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

training_args = TrainingArguments(
    ...
    deepspeed='ds_config.json'
)

trainer = Trainer(
        model=model,
        tokenizer=tokenizer,
        args=training_args
)
trainer.train()

We would use

deepspeed train.py

instead of

accelerate launch --config_file config.yaml train.py

?

Or would those two be equivalent?

muellerzr · February 13, 2024, 10:43pm

Equivalent

Topic		Replies	Views
Deepspeed script launcher vs accelerate script launcher for TRL DeepSpeed	0	368	December 25, 2023
Using deepspeed script launcher vs accelerate script launcher for TRL 🤗Accelerate	4	1911	January 24, 2024
Besides writing your own training loop, is there any other advantage for using it with deepspeed? 🤗Accelerate	2	588	July 4, 2023
I cannot find the code that transformers trainer model_wrapped by deepspeed , i can find the theory about model_wrapped was wraped by DDP(Deepspeed(transformer model )) ,but i only find the code transformers model wrapped by ddp, where is the deepspeed wr DeepSpeed	1	136	May 1, 2024
Difference between using the Trainer class vs Accelerate library DeepSpeed	0	905	June 27, 2023

Exact difference between Transformers' and Accelerate's DeepSpeed integrations?

Related topics