Yes.
My code is here
torch.cuda.set_device(1)
torch.cuda.current_device()
1
epochs = 3
training_args = TrainingArguments(
do_predict=True,
output_dir=f'./results',
overwrite_output_dir=True,
do_train=True,
num_train_epochs=epochs,
per_device_train_batch_size=190,
logging_steps=10,
learning_rate=5e-05,
warmup_steps=500,
save_total_limit = 100,
logging_dir='./logs',
save_steps=50)
training_args.device
out is
device(type='cuda', index=0)
and
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=train_dataset)
!nvidia-smi
I can see two GPUs
and nvidia-smi shows that the process is running with GPU #0 still.
It seems that torch.cuda.set_device(1) doens’t work at all.