LM example run_clm.py isn't distributing data across multiple GPUs as expected

If you want to use DDP (distributed data parallel) you do need to launch the script with python -m torch.distributed.launch.

1 Like