I am running a multiobjective problem where I compute three losses and then sum them up. For each loss, I want to have a learnable coefficient (alpha
, beta
, and gamma
, respectively) that will be optimized.
optimizer = AdamW(model.parameters(), lr=2e5, eps=1e8)
for batch in dl:
optimizer.zero_grad()
result = model(batch)
loss1 = loss_fn_1(result)
loss2 = loss_fn_2(result)
loss3 = loss_fn_3(result)
# How to optimize alpha, beta, and gamma?
loss = alpha*loss1 + beta*loss2 + gamma*loss3
loss.backward()
optimizer.step()
Specific questions:

Should I even have coefficients
alpha
,beta
, andgamma
? The optimizer will minimize, so they’ll all go to 0.0, right? 
If having those coefficients is a good idea, how can I prevent them from going to 0.0? Someone told me to use regularization, but what does that mean in this case?

How do I declare
alpha
,beta
, andgamma
to be learnable byAdamW
?