I’m new to “accelerate” and am trying to port some (working) code to support multi-GPU training. The full code is too lengthy to include, but I believe this is the relevant excerpt:
from accelerate import Accelerator
device = Accelerator.device
accelerator = Accelerator()
# build model, choose optimizer and scheduler, build dataloaders, etc etc
# ...
# Specify a tensor needed by the loss function
class_weights = torch.tensor(np.array([.1, .2, .3]), dtype=torch.float)
# Put everything onto appropriate GPU (?)
(class_weights, model, optimizer, scheduler,
dataloaders["train"], dataloaders["test"], dataloaders["valid"] ) \
= accelerator.prepare(class_weights, model, optimizer, scheduler,
dataloaders["train"], dataloaders["test"], dataloaders["valid"])
# Define training loop
def train(model, optimizer, scheduler, weight):
criterion = nn.CrossEntropyLoss(weight=weight)
for epoch in range(10):
model.train()
with torch.set_grad_enabled(True):
for bi, (inputs, labels) in enumerate(dataloaders["train"]):
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
# Actually train
train(model, optimizer, scheduler, class_weights)
The code raises an exception when it gets to the loss = criterion...
line:
...
File ~/Mirabolic/fezzik/raceblind/venv/lib/python3.10/site-packages/torch/nn/functional.py:3029, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
3027 if size_average is not None or reduce is not None:
3028 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3029 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: Expected all tensors to be on the same device, but found at least
two devices, cuda:0 and cpu! (when checking argument for argument weight in
method wrapper_CUDA_nll_loss_forward)
I initially assumed that either outputs
or labels
was somehow still on the CPU, but if I examine the variables right before the call to criterion
they both seem to be on the (same) GPU:
In [4]: labels.is_cuda
Out[4]: True
In [5]: labels.get_device()
Out[5]: 0
In [6]: outputs.is_cuda
Out[6]: True
In [7]: outputs.get_device()
Out[7]: 0
I’m not sure how to proceed and would be very grateful for any suggestions.
FWIW, I’m using PyTorch 2.0.1+cu117
and accelerate 0.23.0
; the system has two V100 GPUs.