Hi, I’m finding a problem with the code from the Hugging Face course, " Write your training loop in PyTorch" when the program runs the statement: optimizer.step(). I verified that the code is exactly the same as that in the Hugging Face video. I’d be grateful if anyone has a solution that can fix the issue.
The code is below:
from tqdm.auto import tqdm
progress_bar = tqdm(range(num_training_steps))
model.train()
for epoch in range(num_epochs):
for batch in train_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step() ← error occurs here. I got same error on Google Colab & Intel server w/GPU)
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
This is the Runtime Error trace from Google Colab:
RuntimeError Traceback (most recent call last)
in <cell line: 6>()
10 loss = outputs.loss
11 loss.backward()
—> 12 optimizer.step()
13 lr_scheduler.step()
14 optimizer.zero_grad()
3 frames
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py in wrapper(*args, **kwargs)
73 instance._step_count += 1
74 wrapped = func.get(instance, cls)
—> 75 return wrapped(*args, **kwargs)
76
77 # Note that the returned function here is no longer a bound method,
/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
383 )
384
→ 385 out = func(*args, **kwargs)
386 self._optimizer_step_code()
387
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
→ 115 return func(*args, **kwargs)
116
117 return decorate_context
/usr/local/lib/python3.10/dist-packages/transformers/optimization.py in step(self, closure)
574 # Decay the first and second moment running average coefficient
575 # In-place operations to update the averages at the same time
→ 576 exp_avg.mul_(beta1).add_(grad, alpha=(1.0 - beta1))
577 exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)
578 denom = exp_avg_sq.sqrt().add_(group[“eps”])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!