Getting error: Expected all tensors to be on same device, but found at least two devices, cuda:0 and cpu!

vjnadkarni · May 20, 2024, 4:35am

Hi, I’m finding a problem with the code from the Hugging Face course, " Write your training loop in PyTorch" when the program runs the statement: optimizer.step(). I verified that the code is exactly the same as that in the Hugging Face video. I’d be grateful if anyone has a solution that can fix the issue.

The code is below:
from tqdm.auto import tqdm
progress_bar = tqdm(range(num_training_steps))
model.train()
for epoch in range(num_epochs):
for batch in train_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step() ← error occurs here. I got same error on Google Colab & Intel server w/GPU)
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)

This is the Runtime Error trace from Google Colab:
RuntimeError Traceback (most recent call last)
in <cell line: 6>()
10 loss = outputs.loss
11 loss.backward()
—> 12 optimizer.step()
13 lr_scheduler.step()
14 optimizer.zero_grad()

3 frames
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py in wrapper(*args, **kwargs)
73 instance._step_count += 1
74 wrapped = func.get(instance, cls)
—> 75 return wrapped(*args, **kwargs)
76
77 # Note that the returned function here is no longer a bound method,

/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
383 )
384
→ 385 out = func(*args, **kwargs)
386 self._optimizer_step_code()
387

/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
→ 115 return func(*args, **kwargs)
116
117 return decorate_context

/usr/local/lib/python3.10/dist-packages/transformers/optimization.py in step(self, closure)
574 # Decay the first and second moment running average coefficient
575 # In-place operations to update the averages at the same time
→ 576 exp_avg.mul_(beta1).add_(grad, alpha=(1.0 - beta1))
577 exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)
578 denom = exp_avg_sq.sqrt().add_(group[“eps”])

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

nielsr · May 20, 2024, 7:31am

Hi,

Could you verify that your model as well as your model inputs are on the GPU?

vjnadkarni · May 20, 2024, 4:49pm

Hi Niels, thanks for your prompt reply.

I can verify that my model is on the GPU. This is the code for it:

RaushanTurganbay · May 20, 2024, 7:45pm

Maybe you initialized the optimizer before moving model to the “cuda”, so that the optimizer got model params on cpu.

vjnadkarni · May 21, 2024, 6:00am

Thanks, you were correct! I had initialized the optimizer before the device-agnostic code, so the device variable was undefined, and the optimizer assumed the default of “cpu”. I moved the device agnostic code to the start (which set device to “cuda”) and it worked fine after that.

Topic		Replies	Views
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! 🤗Transformers	2	150	March 25, 2025
Trainer.train throws RuntimeError: Expected all tensors to be on the same device Beginners	5	3332	May 17, 2023
RuntimeError - NPL with transformers book - 02_classification.ipynb Beginners	1	387	April 30, 2022
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 🤗Accelerate	1	751	May 31, 2024
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! (when checking argument for argument index in method wrapper_CUDA__index_select) Models	0	412	December 25, 2023

Getting error: Expected all tensors to be on same device, but found at least two devices, cuda:0 and cpu!

Related topics