Trainer API for Model Parallelism on Multiple GPUs

muellerzr · August 4, 2023, 2:22pm

It depends on how you launch the script. If you use torch.distributed.launch (or have accelerate config setup for multi-gpu) it’ll use DistributedDataParallism. To use model parallelism just launch with python {myscript.py} and it should pick up model parallism. (If you find it does not, or need some more assistance, let me know!)

You can verify if so by checking if trainer.args.parallel_mode prints ParallelMode.NOT_DISTRIBUTED.

Topic		Replies	Views
Model parallel with deepspeed integration Beginners	0	645	September 14, 2021
Which method is use HF Trainer with multiple GPU? 🤗Transformers	4	1564	June 19, 2023
Model Parallelism, how to parallelize transformer? Beginners	3	12764	June 18, 2021
Using Transformers with DistributedDataParallel — any examples? Intermediate	11	23488	May 8, 2023
Basics for Multi GPU Training with Huggingface Trainer 🤗Transformers	0	2692	June 14, 2023

Trainer API for Model Parallelism on Multiple GPUs

Related topics