Exact difference between Transformers' and Accelerate's DeepSpeed integrations?

Hello,

I’m trying to use DeepSpeed with Transformers, and I see there are two DeepSpeed integrations documented on HF:
(a) Transformers’ DeepSpeed integration: DeepSpeed Integration
(b) Accelerate’s DeepSpeed integration: DeepSpeed

However, I’m a bit confused by these two.
They have separate documentations, but are they really two completely separate integrations?

After examining the codes, I’ve realized that Accelerator._prepare_deepspeed calls deepspeed.initialize while there is no calling of deepspeed.initialize on Transformers’ end, which seems to be contradicting with documentation (a) as “Trainer Deepspeed Integration” shouldn’t require the user to call deepspeed.initialize manually.

Could someone clarify it for me, please?

Thank you!

The Trainer uses Accelerate under the hood so it’s called when the Trainer calls accelerator.prepare()

1 Like

Thank you for your response.

Does that mean Transformers’ DeepSpeed integration relies on Accelerate’s DeepSpeed integration?

Yes indeed

1 Like

so in an example where the script includes

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

training_args = TrainingArguments(
    ...
    deepspeed='ds_config.json'
)

trainer = Trainer(
        model=model,
        tokenizer=tokenizer,
        args=training_args
)
trainer.train()

We would use

deepspeed train.py

instead of

accelerate launch --config_file config.yaml train.py

?

Or would those two be equivalent?

Equivalent