Slow processing with map when using deepspeed or fairscale

What about DDP with the same number of processes?

python -m torch.distributed.launch --nproc_per_node=3 

You will most likely see the same slow-down as now you have more than 1 process compete over your limited resources. So if the problem is the same, it’s then neither deepspeed nor fairscale but how many processes you use.

With deepspeed or fairscale, when debugging such problems, first always try to remove these from the equation, and do the same setup in straight pytorch.