Minimal changes for using DataParallel?

gpjt · May 12, 2024, 7:56pm

Hi all! I’m playing around with a bit of fine-tuning, just to get a basic understanding of how it works. I’ve successfully done some simple single-GPU tunes, and my next step is to try multi-GPU training. I’ve read the help page on efficient training on multiple GPUs, and was originally planning to do the train using Distributed DataParallel, but at this stage I want to run the training in a notebook, so I can’t easily use the launcher for that. A ChatGPT conversation gave me the belief that I could stay in the notebook (at the loss of some efficiency) by using DataParallel.

From the same ChatGPT session, I got the impression that using DP was as simple as wrapping my model in DataParallel:

from torch.nn import DataParallel

parallel_model = DataParallel(model).cuda()

…and then passing parallel_model into the trainer. However, that leads to an IndexError suggesting that somehow the dataset isn’t getting through to the trainer. You can see the full code and the error in this notebook. The same code successfully runs the training if model is passed in to the Trainer instead of parallel_model.

Is there a simple way to use DataParallel in a notebook like this? Or is this a blind alley I should abandon, and focus on DDP instead?

gpjt · June 17, 2024, 7:22pm

Update for anyone else with the same problems; I’m now 99% sure it was a ChatGPT hallucination. After much digging, it doesn’t appear to be possible to simply wrap a model in DataParallel and then use it with the Trainer.

I wound up changing the notebook so that it was a regular script, then running it with

torchrun --nproc_per_node=2 script.py

…and it worked fine.

My takeaway is that it doesn’t seem possible to do multi-GPU training inside a notebook, which is fine! I can build a simple model in a notebook then switch to using a script when I want to scale it up.

Topic		Replies	Views
Which data parallel does trainer use? DP or DDP? 🤗Transformers	2	6358	August 17, 2022
Running a Trainer in DistributedDataParallel mode 🤗Transformers	1	1449	October 24, 2020
Trainer API for Model Parallelism on Multiple GPUs 🤗Transformers	5	4160	September 10, 2024
Multi gpu training 🤗Transformers	3	6014	April 24, 2022
Trainer is not using multiple GPUs in the DP setup Beginners	0	817	April 9, 2023

Minimal changes for using DataParallel?

Related topics