Adding learnable coefficients for multi-objective losses?

facehugger2020 · November 24, 2020, 8:11pm

I am running a multi-objective problem where I compute three losses and then sum them up. For each loss, I want to have a learnable coefficient (alpha, beta, and gamma, respectively) that will be optimized.

optimizer = AdamW(model.parameters(), lr=2e-5, eps=1e-8)

for batch in dl:

    optimizer.zero_grad()

    result = model(batch)

    loss1 = loss_fn_1(result)
    loss2 = loss_fn_2(result)
    loss3 = loss_fn_3(result)

    # How to optimize alpha, beta, and gamma?
    loss = alpha*loss1 + beta*loss2 + gamma*loss3 
                        
    loss.backward()
    optimizer.step()

Specific questions:

Should I even have coefficients alpha, beta, and gamma? The optimizer will minimize, so they’ll all go to 0.0, right?
If having those coefficients is a good idea, how can I prevent them from going to 0.0? Someone told me to use regularization, but what does that mean in this case?
How do I declare alpha, beta, and gamma to be learnable by AdamW?

Jung · November 25, 2020, 7:21am

Yes
Theoretically, we have to make a constraint like alpha+beta+gamma = 1. To change this to unconstrained optimization, we have to use Lagrange multiplier to the constraint equation, and that will be the regularization formula your friend talked about e.g. you put

lambda1*alpha, lambda2*beta and lambda3*gamma

into loss function. I believe it complicates the problem even more since finding optimum values of lambdas are difficult even theoretically.

2.5 Sorry not answer you Q3, but I think the practical way is to treat alpha, beta and gamma as hyperparameters and simply optimize them via grid search.

In this case, simply split some of your training set to validation set, and define the metric on it. The “validation metric” has to be specified to be suitable to your problem (e.g. error, f1, spearman or any others) — you can get some ideas on metrics by finding some Kaggle competitions that is similar to your problem and see their metrics.

Select hyperparaeters that optimize your validation metric.

facehugger2020 · November 25, 2020, 6:27pm

Theoretically, we have to make a constraint like alpha+beta+gamma = 1

Thank you.

Last night I was thinking of doing

loss = alpha*loss1 + beta*loss2 + (1.0 - alpha - beta)*loss3

which seems to be equivalent to what you wrote above.

Topic		Replies	Views
Multi Objective Hyperparameter Optimization 🤗Transformers	3	74	March 7, 2025
Loss values change but accuracy, f1 and recall remain the same 🤗Transformers	0	630	June 9, 2023
T5 training with Trainer, w/ AdaFactor 🤗Transformers	0	955	February 12, 2023
Trainer API Error when Hyperparameter Tuning with Custom Loss Function Beginners	1	637	May 30, 2024
Trainer optimizer 🤗Transformers	11	8928	August 7, 2021

Adding learnable coefficients for multi-objective losses?

Related topics