Why is Trainer only using 1 (not 4) GPUs?

The Transformers Trainer is only using 1 out of 4 possible GPUs. Why is that?
Hi, I’ve set CUDA_VISIBLE_DEVICES=0,1,2,3 and torch.cuda.device_count() shows 4. But when I run my Trainer, nvtop shows that only GPU 0 is computing anything. I would expect all 4 GPU usage bars in the following screenshot to be all the way up, but devices 1-3 show 0% usage:

I even tried manually setting trainer.args._n_gpu = 4 (instead of leaving it alone) but it had no effect.

Someone will ask to see the full code for what I’m doing, which is understandable. It is a clone of this Kaggle notebook: Music Genre Classification with Wav2Vec2 | Kaggle

PS- In searching for answers, I notice people asking how to limit the number of GPUs to 1 but I"m trying to get it to do the opposite – use all GPUs! If the latter is supposed to be the default… ?

Do the HuggingFace models not automatically invoke DistributedDataParallel? If that omission is the issue, then…that would be understandable. I don’t see any usage of “parallel” or “accelerate” anywhere in the code. So perhaps I need to add that manually?

Tried taking a look at accelerate example notebook for SimpleNLP, but it crashes with SIGSEV for me: