According to the following question, the trainer will handle multiple GPU work. What is the method it uses?
DataParallel (DP) or TensorParallel (TP) or PipelineParallel (PP) or DPP, what?
According to the following question, the trainer will handle multiple GPU work. What is the method it uses?
DataParallel (DP) or TensorParallel (TP) or PipelineParallel (PP) or DPP, what?
Based on this line of code it looks like it is using nn.DataParallel
, however I haven’t fully looked at every line of the Trainer
class so they may also be using other methods in other points.
Thank you for your information. Old Trainer documents have to configure that. But new document doese not mention it. Old Doc - Trainer — transformers 4.7.0 documentation
hi @AndreaSottana , sorry I am trying to fine tune got-neo because of the Cuda memory issue I need to use multiple GPU. I use the trainer in hugging face which I understand it will use multiple GPu . but my results are very strange and very different than when I use 1 GPU. Would you please help me how you use multiple GPU for fine tunning the model.?
Many thanks
Hi @Indramal
sorry I am trying to fine tune got-neo because of the Cuda memory issue I need to use multiple GPU. I use the trainer in hugging face which I understand it will use multiple GPu . but my results are very strange and very different than when I use 1 GPU. Would you please help me how you use multiple GPU for fine tunning the model.?
Many thanks