Hi I’m trying to fine-tune model with Trainer in transformers,
Well, I want to use a specific number of GPU in my server.
My server has two GPUs,(index 0, index 1) and I want to train my model with GPU index 1.
I’ve read the Trainer and TrainingArguments documents, and I’ve tried the CUDA_VISIBLE_DEVICES thing already. but it didn’t worked for me.
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
(I did it within Jupyter, before I import all libraries)
It gave me a runtime error when the trainer tries to initiate self.model = model.to(args.device) line.
and the error says like RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable.
I’ve also tried torch.cuda.set_device(1), it also didn’t work.
I don’t know how to set it up. It seems like I don’t have any options in argument of class
Please help me to handle this problem.
Thank you.