How to use specified GPUs with Accelerator to train the model?

I’m training my own prompt-tuning model using transformers package. I’m following the training framework in the official example to train the model. I’m training environment is the one-machine-multiple-gpu setup. My current machine has 8 gpu cards and I only want to use some of them. However, the Accelerator fails to work properly. It just puts everything on gpu:0, so I cannot use mutliple gpus. Also, os.environ['CUDA_VISIBLE_DEVICES'] fails to work.
I have re-written the code without using Accelerator. Instead, I use nn.Dataparallel with os.environ['CUDA_VISIBLE_DEVICES'] to specify the gpus. Everything work fine in this case.
So what’s the reason? According the manual, I think Accelerator should be able to take care of all these things. Thank you so much for your help!

FYI, here is the version information:
python 3.6.8
transformers 3.4.0
accelerate 0.5.1
NVIDIA gpu cluster

Accelerator does not use DataParallel on purpose since it’s not recommended by PyTorch. Have you properly set up your config in accelerate config and launched your script with accelerate launch?

Alternatively, did you launch you script with python -m torch.distributed.launch ...? See more commands here.

Thanks for you reply! I tried to use accelerate config, but I haven’t found a place to specify the gpu cards that I want to use. For example, if I set nproc_per_node to 4, it will automatically use gpu:0, gpu:1, gpu:2, gpu:3 on my machine. Is there a way to change this behavior?
Thank you so much~

No, you will also need to add CUDA_VISIBLE_DEVICES=“0, 1, 2, 3” when launching, to use those four GPUS.

Yes, I actually done this by setting os.environ['CUDA_VISIBLE_DEVICES'] = "3,4,5,6" at the beginning of my code. But it doesn’t work. Did I miss anything?
Thank you for your help!

No it needs to be done before the lauching command:

CUDA_VISIBLE_DEVICES = "3,4,5,6" accelerate launch training_script.py


Still fails to work correctly :no_mouth:

Why do you say that? It seems good to me.

Oach, sorry. I just check the gpu state. It’s great. I just stupidly thought the Device should show cuda:3/4/5/6 (it shouldn’t of course since only 4 gpus are visible).
Thank you so much for your quick reply. Your help really save me since it’s my first time to use accelerate package.

Yes, you can’t trust completely the devices printed :slight_smile: