Minimal changes for using DataParallel?

Hi all! I’m playing around with a bit of fine-tuning, just to get a basic understanding of how it works. I’ve successfully done some simple single-GPU tunes, and my next step is to try multi-GPU training. I’ve read the help page on efficient training on multiple GPUs, and was originally planning to do the train using Distributed DataParallel, but at this stage I want to run the training in a notebook, so I can’t easily use the launcher for that. A ChatGPT conversation gave me the belief that I could stay in the notebook (at the loss of some efficiency) by using DataParallel.

From the same ChatGPT session, I got the impression that using DP was as simple as wrapping my model in DataParallel:

from torch.nn import DataParallel

parallel_model = DataParallel(model).cuda()

…and then passing parallel_model into the trainer. However, that leads to an IndexError suggesting that somehow the dataset isn’t getting through to the trainer. You can see the full code and the error in this notebook. The same code successfully runs the training if model is passed in to the Trainer instead of parallel_model.

Is there a simple way to use DataParallel in a notebook like this? Or is this a blind alley I should abandon, and focus on DDP instead?

Update for anyone else with the same problems; I’m now 99% sure it was a ChatGPT hallucination. After much digging, it doesn’t appear to be possible to simply wrap a model in DataParallel and then use it with the Trainer.

I wound up changing the notebook so that it was a regular script, then running it with

torchrun --nproc_per_node=2 script.py

…and it worked fine.

My takeaway is that it doesn’t seem possible to do multi-GPU training inside a notebook, which is fine! I can build a simple model in a notebook then switch to using a script when I want to scale it up.