What arguments need to be changed when using deepeed in trainer?

I understand that we will need to change a regular call function like python run_trainer.py ... to deepspeed --hostfile <hostfile> run_trainer.py ... --deepspeed <deepspeed_config_file>.

Besides the deepspeed argument, is there anything else I should change, for example, sharded_ddp, ddp_find_unused_parameters, skip_memory_metrics, etc.?

You should find everything that you need over here: DeepSpeed Integration — transformers 4.7.0 documentation (huggingface.co)

1 Like

Thanks! I already looked at this page. It’s actually still unclear to me what the purpose is for sharded_ddp, given it seems to be related to deepspeed. Could you explain a bit more on the usage of sharded_ddp? Thanks!