It depends on how you launch the script. If you use torch.distributed.launch
(or have accelerate config
setup for multi-gpu) it’ll use DistributedDataParallism. To use model parallelism just launch with python {myscript.py}
and it should pick up model parallism. (If you find it does not, or need some more assistance, let me know!)
You can verify if so by checking if trainer.args.parallel_mode
prints ParallelMode.NOT_DISTRIBUTED
.