[Question] How to optimize two loss alternately with gradient accumulation?

muellerzr · August 21, 2023, 3:38pm

According to here, that seems like it would work fine: Multiple gradient updates with two separate losses and two classifiers sharing the same encoder - nlp - PyTorch Forums. During accumulation it’d be attached separately/referenced separately I believe, with wouldn’t cause them to entangle.

The key here is don’t call zero_grad() before if possible. So I think we can essentially rewrite this as:

# Suppose gradient_accumulation is set to 2.
optimizer = optim(unet.parameters())
with accelerator.accumulate(unet):
    outputs = unet(input)
    loss1 = loss_func1(outputs)
    loss1.backward(retain_graph=True)
    loss2 = loss_func2(outputs)
    loss2.backward()
    optimizer.step()
    optimizer.zero_grad()

And this should work/be okay, the key is ensuring retain_graph=True

Topic		Replies	Views
Is there a way to backpropagate through multiple steps while using Trainer API 🤗Transformers	1	254	July 9, 2021
Gradient accumulation averages over gradient 🤗Transformers	2	2069	November 12, 2020
Performing gradient accumulation with Accelerate 🤗Accelerate	3	587	March 4, 2024
Gradient accumulation loss compute Beginners	0	79	June 4, 2024
Can we use Gradient Checkpointing and Gradient Accumulation at Once? 🤗Transformers	1	1230	September 14, 2021

[Question] How to optimize two loss alternately with gradient accumulation?

Related topics