Minimal changes for using DataParallel?

Update for anyone else with the same problems; I’m now 99% sure it was a ChatGPT hallucination. After much digging, it doesn’t appear to be possible to simply wrap a model in DataParallel and then use it with the Trainer.

I wound up changing the notebook so that it was a regular script, then running it with

torchrun --nproc_per_node=2 script.py

…and it worked fine.

My takeaway is that it doesn’t seem possible to do multi-GPU training inside a notebook, which is fine! I can build a simple model in a notebook then switch to using a script when I want to scale it up.