I understand that we will need to change a regular call function like python run_trainer.py ...
to deepspeed --hostfile <hostfile> run_trainer.py ... --deepspeed <deepspeed_config_file>
.
Besides the deepspeed
argument, is there anything else I should change, for example, sharded_ddp
, ddp_find_unused_parameters
, skip_memory_metrics
, etc.?