Hi, I’m finding a problem with the code from the Hugging Face course, " Write your training loop in PyTorch" when the program runs the statement: optimizer.step(). I verified that the code is exactly the same as that in the Hugging Face video. I’d be grateful if anyone has a solution that can fix the issue.
The code is below:
from tqdm.auto import tqdm
progress_bar = tqdm(range(num_training_steps))
model.train()
for epoch in range(num_epochs):
for batch in train_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step()    ← error occurs here. I got same error on Google Colab & Intel server w/GPU)
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
This is the Runtime Error trace from Google Colab:
RuntimeError                              Traceback (most recent call last)
 in <cell line: 6>()
10         loss = outputs.loss
11         loss.backward()
—> 12         optimizer.step()
13         lr_scheduler.step()
14         optimizer.zero_grad()
3 frames
/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py in wrapper(*args, **kwargs)
73                 instance._step_count += 1
74                 wrapped = func.get(instance, cls)
—> 75                 return wrapped(*args, **kwargs)
76
77             # Note that the returned function here is no longer a bound method,
/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
383                             )
384
 → 385                 out = func(*args, **kwargs)
386                 self._optimizer_step_code()
387
/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py in decorate_context(*args, **kwargs)
113     def decorate_context(*args, **kwargs):
114         with ctx_factory():
 → 115             return func(*args, **kwargs)
116
117     return decorate_context
/usr/local/lib/python3.10/dist-packages/transformers/optimization.py in step(self, closure)
574                 # Decay the first and second moment running average coefficient
575                 # In-place operations to update the averages at the same time
 → 576                 exp_avg.mul_(beta1).add_(grad, alpha=(1.0 - beta1))
577                 exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)
578                 denom = exp_avg_sq.sqrt().add_(group[“eps”])
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!