Besides writing your own training loop, is there any other advantage for using it with deepspeed?

And want to use Transformers with deepspeed, and it seems that the two main ways are to use it either with the trainer or with the accelerate function.

The only difference I can find is that you should use accelerate only if you want to write your own training loop.

Is that the only difference?

Correct, as the Trainer runs on Accelerate now

1 Like

@muellerzr does that mean if accelerator is configured using accelerator config and we load the model using

   model = transformers.AutoModelForCausalLM.from_pretrained(
   model_args.model_name,
   torch_dtype=torch.bfloat16,
   trust_remote_code=True,
   device_map="auto"
         )

trainer.train() will use accelerate on the back-end to train/fine-tune the model on multiple GPUs and there’s no need to write the custom training loop ?