I have been trying to finetune the decoder by inserting an adapter after the ffn block of a large encoder-decoder model for the MT task, but the blue score is bad when compared to full finetuning.
Please, can anyone suggest a better method to finetune or refer me to some resources to finetune the decoder of a model
have you tried fixing it? i faced the same exact problem
yes, it was an issue with the initialization of layer norms in the network. I had to manually initialize the layer norms which fixed the issue
could you please share with me the notebook?
I am not using notebook but I can share the code to initialize new layernorm params
for name, param in model.named_parameters():
if "language_adaptor" in name and "norm" in name:
if "weight" in name: # Layer normalization scale
print("Initializing {} with mean=0 and std=1".format(name))
init.ones_(param.data)
elif "bias" in name: # Layer normalization bias
print("Initializing {} with mean=0".format(name))
init.zeros_(param.data)
1 Like