Which data parallel does trainer use? DP or DDP?

perhaps useful to you: Using Transformers with DistributedDataParallel — any examples?

3 Likes