"Expected all tensors to be on the same device..."

I’m new to “accelerate” and am trying to port some (working) code to support multi-GPU training. The full code is too lengthy to include, but I believe this is the relevant excerpt:

from accelerate import Accelerator

device = Accelerator.device
accelerator = Accelerator()

# build model, choose optimizer and scheduler, build dataloaders, etc etc
# ...
# Specify a tensor needed by the loss function
class_weights = torch.tensor(np.array([.1, .2, .3]), dtype=torch.float)
# Put everything onto appropriate GPU (?)
(class_weights, model, optimizer, scheduler,
 dataloaders["train"], dataloaders["test"], dataloaders["valid"] ) \ 
= accelerator.prepare(class_weights, model, optimizer, scheduler,
                      dataloaders["train"], dataloaders["test"], dataloaders["valid"])
# Define training loop
def train(model, optimizer, scheduler, weight):
   criterion = nn.CrossEntropyLoss(weight=weight)
   for epoch in range(10):
            with torch.set_grad_enabled(True):
                for bi, (inputs, labels) in enumerate(dataloaders["train"]):

                    outputs = model(inputs)
                    loss = criterion(outputs, labels)
# Actually train
train(model, optimizer, scheduler, class_weights)

The code raises an exception when it gets to the loss = criterion... line:

File ~/Mirabolic/fezzik/raceblind/venv/lib/python3.10/site-packages/torch/nn/functional.py:3029, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3027 if size_average is not None or reduce is not None:
   3028     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3029 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

RuntimeError: Expected all tensors to be on the same device, but found at least
two devices, cuda:0 and cpu! (when checking argument for argument weight in
method wrapper_CUDA_nll_loss_forward)

I initially assumed that either outputs or labels was somehow still on the CPU, but if I examine the variables right before the call to criterion they both seem to be on the (same) GPU:

In [4]: labels.is_cuda
Out[4]: True

In [5]: labels.get_device()
Out[5]: 0

In [6]: outputs.is_cuda
Out[6]: True

In [7]: outputs.get_device()
Out[7]: 0

I’m not sure how to proceed and would be very grateful for any suggestions.

FWIW, I’m using PyTorch 2.0.1+cu117 and accelerate 0.23.0; the system has two V100 GPUs.

Aha, I think I’ve found my own problem :face_with_open_eyes_and_hand_over_mouth:

First, I defined the device incorrectly (confusing the module and the object); it should be:

accelerator = Accelerator()
device = accelerator.device

Second, now that device is defined correctly, we can add a line after the initial definition of class_weights:

class_weights = class_weights.to(device)

That seems (?) to fix all my problems.