Trainer.train throws RuntimeError: Expected all tensors to be on the same device

sssingh · April 1, 2022, 4:26am

I have moved the model to ‘cuda’ and confirmed that the TrainingArguments object does pickup ‘cuda’ as a device but when I try to train it throws this error, here is the code…

Set device

device = torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’)
torch.cuda.current_device(), device

OUTPUT → (0, device(type=‘cuda’))

Create model and push to cuda

model = AutoModelForSequenceClassification.from_pretrained(MODEL_CKPT, num_labels=6).to(device)

Instantiate a TrainerArguments objectbatch_size = 64

logging_steps = len(emotion_encoded[‘train’]) // batch_size
epochs=2
learning_rate = 2e-5
output_dir = MODEL_CKPT + ‘-finetuned-emotion-sssingh’
args = TrainingArguments(output_dir=output_dir,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
learning_rate=learning_rate,
weight_decay=0.01,
num_train_epochs=epochs,
evaluation_strategy=‘epoch’,
disable_tqdm=False,
logging_steps=logging_steps,
log_level=‘error’)

args.device
OUTPUT → device(type=‘cuda’, index=0)

Instantiate a Trainer object and train model end-to-end

trainer = Trainer(model=model,
tokenizer=tokenizer,
args=args,
compute_metrics=performance_metric,
train_dataset=emotion_encoded[‘train’],
eval_dataset=emotion_encoded[‘validation’])

Train

trainer.train()

This throws this error …
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

BramVanroy · April 1, 2022, 9:49am

TrainingArguments automatically sets the device to a GPU (cuda:0) if it is available.

github.com

huggingface/transformers/blob/9de70f213eb234522095cc9af7b2fac53afc2d87/src/transformers/training_args.py#L1069-L1079

      
        
            elif self.local_rank == -1:
                # if n_gpu is > 1 we'll use nn.DataParallel.
                # If you only want to use a specific subset of GPUs use `CUDA_VISIBLE_DEVICES=0`
                # Explicitly set CUDA to the first (index 0) CUDA device, otherwise `set_device` will
                # trigger an error that a device index is missing. Index 0 takes into account the
                # GPUs available in the environment, so `CUDA_VISIBLE_DEVICES=1,2` with `cuda:0`
                # will use the first GPU in that env, i.e. GPU#1
                device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
                # Sometimes the line in the postinit has not been run before we end up here, so just checking we're not at
                # the default value.
                self._n_gpu = torch.cuda.device_count()

You should not manually move your model to GPU. So remove all the .to() calls. After initializing you can verify that the GPU is being used by checking args.device.

If that is correct, and you are still experiencing an issue, it is possible that your custom function performance_metric does something with mixed tensors. In that case, please post the full error trace and the custom function.

Francal · April 29, 2022, 10:16am

Hi Bram,

I get the same error message for the Trainer.train() command when I run the below code in Colab using GPU as the run type. However, when I use CPU as the run type it runs successfully without any errors. Do you have any advise as to how I can run the below in GPU without getting the “Expected all tensors to be on the same device error”

Colab Code

BramVanroy · April 29, 2022, 4:10pm

This may be the same issue as discussed here.

Francal · April 30, 2022, 1:48am

Thank you very much. The solution proposed in the link you provided worked perfectly.

LidorPrototype · May 17, 2023, 4:14am

I tried that, but it did not solve the issue for me at all any other ideas?

Topic		Replies	Views
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! 🤗Transformers	2	150	March 25, 2025
Trainer, device error cuda:0 and cuda:1 🤗Transformers	3	3356	January 17, 2024
Trainer.evalute() with multi GPUs results Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0! Beginners	2	79	February 11, 2025
Setting specific device for Trainer Beginners	25	41731	July 21, 2024
Can I use CUDA with Trainer.train? Beginners	3	7906	May 10, 2022