This discussion has also been post at https://github.com/huggingface/accelerate/issues/1154
Hi, there.
I am new to accelerate
and I’ve found that it really improves my development productivity. Thanks for your great work.
But I have some problems when using accelerator.gather
.
I trained a simple resnet18
classifier on the CIFAR10
dataset. The training loop is:
for idx, (inputs, targets) in enumerate(train_loader):
outputs = net(inputs)
# ********************** loss plan 1 **********************
loss = criterion(outputs, targets)
# ********************** loss plan 1 **********************
# ********************** loss plan 2 **********************
# out_gather=accelerator.gather(outputs)
# tar_gather=accelerator.gather(targets)
# loss = criterion(out_gather, tar_gather)
# ********************** loss plan 2 **********************
optimizer.zero_grad()
accelerator.backward(loss)
optimizer.step()
The code above works well and the training accuracy reaches about 70% after 10 epochs.
But there is a problem when I train as follows:
for idx, (inputs, targets) in enumerate(train_loader):
outputs = net(inputs)
# ********************** loss plan 1 **********************
# loss = criterion(outputs, targets)
# ********************** loss plan 1 **********************
# ********************** loss plan 2 **********************
out_gather=accelerator.gather(outputs)
tar_gather=accelerator.gather(targets)
loss = criterion(out_gather, tar_gather)
# ********************** loss plan 2 **********************
optimizer.zero_grad()
accelerator.backward(loss)
optimizer.step()
The training loss is almost unchanged, and the training accuracy has been maintained at about 10%, which is equivalent to random guessing.
The above code may look weird, but I don’t think it should be wrong, but it is.
( The reason I’m doing this is that I want to use accelerate
when training for contrastive learning tasks. In contrastive learning, the larger the batch_size, the better, and each sample in the batch uses all other samples in the batch as negative examples to calculate the loss. For example, when I train with four gpus and the batch_size of each gpu is 64, I want each sample to be compared with 64*4-1
negative samples instead of 64-1
. In this case I need to use accelerator.gather
.)
The full code is as follows: (it works well for loss plan 1
but not for loss plan 2
)
# main.py
# CUDA_VISIBLE_DEVICES="0,1,2,3" accelerate launch --multi_gpu main.py
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
from accelerate import Accelerator
accelerator=Accelerator()
BATCH_SIZE = 256
EPOCHS = 10
if __name__ == "__main__":
device = accelerator.device
net = torchvision.models.resnet18(pretrained=False, num_classes=10)
trainset = torchvision.datasets.CIFAR10(
root="./data",
train=True,
download=True,
transform=transforms.Compose(
[
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(
(0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)
),
]
),
)
train_loader = torch.utils.data.DataLoader(
trainset,
batch_size=BATCH_SIZE,
num_workers=4,
pin_memory=True,
shuffle=True
)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(
net.parameters(),
lr=0.01 * 2,
momentum=0.9,
weight_decay=0.0001,
nesterov=True,
)
net,optimizer,train_loader=accelerator.prepare(net,optimizer,train_loader)
net.train()
for ep in range(1, EPOCHS + 1):
train_loss = correct = total = 0
for idx, (inputs, targets) in enumerate(train_loader):
outputs = net(inputs)
# ********************** loss plan 1 **********************
# loss = criterion(outputs, targets)
# ********************** loss plan 1 **********************
# ********************** loss plan 2 **********************
out_gather=accelerator.gather(outputs)
tar_gather=accelerator.gather(targets)
loss = criterion(out_gather, tar_gather)
# ********************** loss plan 2 **********************
optimizer.zero_grad()
accelerator.backward(loss)
optimizer.step()
train_loss += loss.item()
total+=targets.size(0)
correct += torch.eq(outputs.argmax(dim=1), targets).sum().item()
print(
" == step: [{:3}/{}] [{}/{}] | loss: {:.3f} | acc: {:6.3f}%".format(
idx + 1,
len(train_loader),
ep,
EPOCHS,
train_loss / (idx + 1),
100.0 * correct / total,
)
)
I’m wondering where I’m going wrong with my code, or how I should use accelerator.gather correctly.
Thanks a lot.