Hi all! I’m playing around with a bit of fine-tuning, just to get a basic understanding of how it works. I’ve successfully done some simple single-GPU tunes, and my next step is to try multi-GPU training. I’ve read the help page on efficient training on multiple GPUs, and was originally planning to do the train using Distributed DataParallel, but at this stage I want to run the training in a notebook, so I can’t easily use the launcher for that. A ChatGPT conversation gave me the belief that I could stay in the notebook (at the loss of some efficiency) by using DataParallel.
From the same ChatGPT session, I got the impression that using DP was as simple as wrapping my model in DataParallel
:
from torch.nn import DataParallel
parallel_model = DataParallel(model).cuda()
…and then passing parallel_model
into the trainer. However, that leads to an IndexError
suggesting that somehow the dataset isn’t getting through to the trainer. You can see the full code and the error in this notebook. The same code successfully runs the training if model
is passed in to the Trainer
instead of parallel_model
.
Is there a simple way to use DataParallel in a notebook like this? Or is this a blind alley I should abandon, and focus on DDP instead?