Hi all, I want to fine-tune a model, Wav2Vec2ForCTC, but I am attempting to use 2 different learning rates for two parts of the model.
After extracting the 2 different parameter groups, I am defining my own optimizer as
optim = torch.optim.AdamW([
{‘params’: model2.wav2vec2.feature_extractor.parameters(), ‘lr’: 2e-5},
{‘params’: base_params}
], lr = 6e-4, weight_decay = 0)
and my own learning rate scheduler (which is simply the default one of the trainer) as
lr_scheduler = get_scheduler(
'linear',
optimizer = optim,
num_warmup_steps = warmup_steps,
num_training_steps = total_training_steps,
)
Which I am then passing to the trainer as trainer = Trainer(...,optimizers = (optim, lr_scheduler))
.
I guess that this learning rate scheduler will affect both groups, won’t it? Meaning that both my learning rates will start from 0 to their respective max values (passed in the optimizer) and then they will follow a linear decay route.
It seems to be working so far, but when I am using Tensorboard to check its plots, the learning_rate
plot it offers only shows one learning rate. Shouldn’t it be showing the evolution of both groups’ learning rates? Am I missing something? Not only that, but when I am opening the trainer_state.json
file saved in the running model’s checkpoints, it only shows one LR again. Is this a sign that my 2 learning rates approach is not working? Thanks in advance.