Is the trainer DDP or DP?

caoyizhen · January 19, 2024, 2:11am

Is the Trainer DDP or DP? If it is DDP, why do I train with multiple graphics cards, and the graphics card memory consumed on cuda-0 is much larger than other graphics cards. Or is it that when I increase per_device_train_batch_size, the cuda-0 card will exceed the graphics card memory, and then it will cut the model parameters to other cards by itself? Or do I need to set any parameters? Just give an example

Topic		Replies	Views
What algorithm Trainer uses for multi GPU training (without torchrun) Beginners	1	910	January 19, 2023
Trainer attribute, n_gpu 🤗Transformers	0	164	February 28, 2024
Distributed training on different gpus Beginners	0	221	August 30, 2023
DDP running out of memory but DP is successful for the same per_device_train_batch_size 🤗Accelerate	0	388	February 5, 2024
Trainer is not using multiple GPUs in the DP setup Beginners	0	816	April 9, 2023

Is the trainer DDP or DP?

Related topics