Which data parallel does trainer use? DP or DDP?

I try to search in the doc. But I didn’t find the answer anywhere.

Thank you

1 Like

It depends if you launch your training script with python (in which case it will use DP) or python -m torch.distributed.launch (in which case it will use DDP).

2 Likes

perhaps useful to you: Using Transformers with DistributedDataParallel — any examples?

1 Like