How to use specified GPUs with Accelerator to train the model?

EchoShao8899 · October 21, 2021, 11:54am

I’m training my own prompt-tuning model using transformers package. I’m following the training framework in the official example to train the model. I’m training environment is the one-machine-multiple-gpu setup. My current machine has 8 gpu cards and I only want to use some of them. However, the Accelerator fails to work properly. It just puts everything on gpu:0, so I cannot use mutliple gpus. Also, os.environ['CUDA_VISIBLE_DEVICES'] fails to work.
I have re-written the code without using Accelerator. Instead, I use nn.Dataparallel with os.environ['CUDA_VISIBLE_DEVICES'] to specify the gpus. Everything work fine in this case.
So what’s the reason? According the manual, I think Accelerator should be able to take care of all these things. Thank you so much for your help!

FYI, here is the version information:
python 3.6.8
transformers 3.4.0
accelerate 0.5.1
NVIDIA gpu cluster

sgugger · October 21, 2021, 11:56am

Accelerator does not use DataParallel on purpose since it’s not recommended by PyTorch. Have you properly set up your config in accelerate config and launched your script with accelerate launch?

Alternatively, did you launch you script with python -m torch.distributed.launch ...? See more commands here.

EchoShao8899 · October 21, 2021, 2:11pm

Thanks for you reply! I tried to use accelerate config, but I haven’t found a place to specify the gpu cards that I want to use. For example, if I set nproc_per_node to 4, it will automatically use gpu:0, gpu:1, gpu:2, gpu:3 on my machine. Is there a way to change this behavior?
Thank you so much~

sgugger · October 21, 2021, 2:16pm

No, you will also need to add CUDA_VISIBLE_DEVICES=“0, 1, 2, 3” when launching, to use those four GPUS.

EchoShao8899 · October 21, 2021, 3:31pm

Yes, I actually done this by setting os.environ['CUDA_VISIBLE_DEVICES'] = "3,4,5,6" at the beginning of my code. But it doesn’t work. Did I miss anything?
Thank you for your help!

sgugger · October 21, 2021, 3:32pm

No it needs to be done before the lauching command:

CUDA_VISIBLE_DEVICES = "3,4,5,6" accelerate launch training_script.py

EchoShao8899 · October 21, 2021, 3:41pm

Still fails to work correctly

sgugger · October 21, 2021, 3:44pm

Why do you say that? It seems good to me.

EchoShao8899 · October 21, 2021, 3:48pm

Oach, sorry. I just check the gpu state. It’s great. I just stupidly thought the Device should show cuda:3/4/5/6 (it shouldn’t of course since only 4 gpus are visible).
Thank you so much for your quick reply. Your help really save me since it’s my first time to use accelerate package.

sgugger · October 21, 2021, 3:52pm

Yes, you can’t trust completely the devices printed

Paluck · June 29, 2022, 11:58am

Sir, I have this error. Can you please suggest me the solution of this error
GPUAccelerator can not run on your system since the accelerator is not available. The following accelerator(s) is available and can be passed into accelerator argument of Trainer: [‘cpu’].

wmmw · October 16, 2022, 1:29pm

Hi,I also have the same error as yours. Have you found the solution? Hope for your reply!

EchoShao8899 · October 31, 2022, 6:34am

Hi,

What the second floor says works for me. A workable command looks like this

CUDA_VISIBLE_DEVICES={gpus you gonna use} python -m torch.distributed.launch --nproc_per_node={the number of gpu used} \
  your_python_script.py {other arguments for your python script}

Sorry for the late reply, hope you have solved the problem.

muellerzr · October 31, 2022, 9:50am

accelerate launch also now lets you specify --gpu_ids as a string too

ndvb · April 10, 2023, 12:43pm

Why don’t you just make our life easier and simply add a parameter to the trainer to get the GPU id or list of ids?

diaafayed72 · August 23, 2024, 11:07am

This does not work on windows prompt

Topic		Replies	Views
Using 3 GPUs for training with Trainer() of transformers 🤗Transformers	2	2291	October 18, 2023
Accelerate doesn't seem to use my GPU? 🤗Accelerate	7	5672	September 18, 2024
Setting specific device for Trainer Beginners	25	41678	July 21, 2024
How to use specific gpu in accelerate? 🤗Accelerate	10	8009	April 25, 2024
Why is Trainer only using 1 (not 4) GPUs? Beginners	1	1584	June 2, 2022

How to use specified GPUs with Accelerator to train the model?

Related topics