Ideally we shouldn’t do this, but if you can modify the last else statement from
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") to
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
in the _setup_devices function on training_args.py file in the transformer library.
I did that too, and I transfer all data to the device. Still when I use trainer.train(). This bug comes up:
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cuda:2
Yes, you’re correct @yaanhaan, but if you pass this argument into the Trainer, It’ll use it for TrainingArguments that’ll set the given device for you run.
It worked for me BTW ![]()
This works for me:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
import torch
You have to set the env variable before importing torch. Now torch only sees device 1, not 0.
If you run this:
print(torch.cuda.current_device())
print(torch.cuda.is_available())
Torch would still think it is using device 0 but from the command nvidia-smi, it is using device 1.
Thank you, @josejames00, you saved my day with this solution
You are my hero
This worked, but crazy I still had to do this in 2025: the trainer overwrites the device the model is in