According to the following question, the trainer will handle multiple GPU work. What is the method it uses?
DataParallel (DP) or TensorParallel (TP) or PipelineParallel (PP) or DPP, what?
According to the following question, the trainer will handle multiple GPU work. What is the method it uses?
DataParallel (DP) or TensorParallel (TP) or PipelineParallel (PP) or DPP, what?
Based on this line of code it looks like it is using nn.DataParallel
, however I haven’t fully looked at every line of the Trainer
class so they may also be using other methods in other points.
Thank you for your information. Old Trainer documents have to configure that. But new document doese not mention it. Old Doc - Trainer — transformers 4.7.0 documentation