How to obatin gradients on different GPUs to do custom accumulations

radiance-nt · September 2, 2023, 6:56am

Hello everyone

I am currently facing a challenge in my work:
I need to obtain the gradients w.r.t different data shards and adjust them for gradient descent.

However, doing autograd to different losses in a serial manner (as shown in the toy example below) is very slow.
So I want to calculate the loss shards on different GPUs in parallel and obtain their individual gradients.

The codes before revised are

import torch
import torch.nn as nn
import torch.optim as optim

class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.output_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.layer1(x)
        x = self.relu(x)
        x = self.output_layer(x)
        return x


input_size, hidden_size, output_size = 10, 5, 1
model = SimpleNN(input_size, hidden_size, output_size)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

batch_size = 5
input_data = torch.rand(batch_size, input_size)
losses = model(input_data)

# The pattern I want to accelerate
grads_list = []
for i, loss in enumerate(losses):
    pseudo_grads = torch.autograd.grad(loss, model.parameters(), retain_graph=True)
    grads_list.append(pseudo_grads)

custom_grads = func(grads_list) # custom accumulation
for p, grad in zip(model.parameters(), custom_grads):
    p.grad = grad

optimizer.step()

I have spent an amount of time researching this issue but failed to find a scheme.
Your insights and suggestions would be greatly appreciated.

Topic		Replies	Views
[Question] How to optimize two loss alternately with gradient accumulation? 🤗Accelerate	4	1653	September 11, 2023
"Expected all tensors to be on the same device..." Beginners	1	1652	September 20, 2023
Question about calculating training loss of multi-GPU with Accelerate 🤗Accelerate	1	838	July 20, 2024
Replicating the same code in gpus 🤗Accelerate	1	351	March 6, 2023
How to collect the accuracy when running multi GPU model with accelerate? 🤗Accelerate	3	959	December 8, 2023

How to obatin gradients on different GPUs to do custom accumulations

Related topics