Getting error: Expected all tensors to be on same device, but found at least two devices, cuda:0 and cpu!

Hi, I’m finding a problem with the code from the Hugging Face course, " Write your training loop in PyTorch" when the program runs the statement: optimizer.step(). I verified that the code is exactly the same as that in the Hugging Face video. I’d be grateful if anyone has a solution that can fix the issue.

The code is below:
from tqdm.auto import tqdm
progress_bar = tqdm(range(num_training_steps))
model.train()
for epoch in range(num_epochs):
for batch in train_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step() ← error occurs here. I got same error on Google Colab & Intel server w/GPU)
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)

This is the Runtime Error trace from Google Colab:
RuntimeError Traceback (most recent call last)
in <cell line: 6>()
10 loss = outputs.loss
11 loss.backward()
—> 12 optimizer.step()
13 lr_scheduler.step()
14 optimizer.zero_grad()

3 frames
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py in wrapper(*args, **kwargs)
73 instance._step_count += 1
74 wrapped = func.get(instance, cls)
—> 75 return wrapped(*args, **kwargs)
76
77 # Note that the returned function here is no longer a bound method,

/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
383 )
384
→ 385 out = func(*args, **kwargs)
386 self._optimizer_step_code()
387

/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
→ 115 return func(*args, **kwargs)
116
117 return decorate_context

/usr/local/lib/python3.10/dist-packages/transformers/optimization.py in step(self, closure)
574 # Decay the first and second moment running average coefficient
575 # In-place operations to update the averages at the same time
→ 576 exp_avg.mul_(beta1).add_(grad, alpha=(1.0 - beta1))
577 exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)
578 denom = exp_avg_sq.sqrt().add_(group[“eps”])

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Hi,

Could you verify that your model as well as your model inputs are on the GPU?

Hi Niels, thanks for your prompt reply.

I can verify that my model is on the GPU. This is the code for it:

Maybe you initialized the optimizer before moving model to the “cuda”, so that the optimizer got model params on cpu.

Thanks, you were correct! I had initialized the optimizer before the device-agnostic code, so the device variable was undefined, and the optimizer assumed the default of “cpu”. I moved the device agnostic code to the start (which set device to “cuda”) and it worked fine after that.