I’m training my own prompt-tuning model using transformers package. I’m following the training framework in the official example to train the model. I’m training environment is the one-machine-multiple-gpu setup. My current machine has 8 gpu cards and I only want to use some of them. However, the
Accelerator fails to work properly. It just puts everything on gpu:0, so I cannot use mutliple gpus. Also,
os.environ['CUDA_VISIBLE_DEVICES'] fails to work.
I have re-written the code without using
Accelerator. Instead, I use
os.environ['CUDA_VISIBLE_DEVICES'] to specify the gpus. Everything work fine in this case.
So what’s the reason? According the manual, I think
Accelerator should be able to take care of all these things. Thank you so much for your help!
FYI, here is the version information:
NVIDIA gpu cluster