Is the trainer DDP or DP?

Is the Trainer DDP or DP? If it is DDP, why do I train with multiple graphics cards, and the graphics card memory consumed on cuda-0 is much larger than other graphics cards. Or is it that when I increase per_device_train_batch_size, the cuda-0 card will exceed the graphics card memory, and then it will cut the model parameters to other cards by itself? Or do I need to set any parameters? Just give an example