Hi, I am looking for a solution to reset a layer in the pre-trained model. For example, like in a BART model, if I am going to reset the last layer of the decoder, how should I implement it?
I notice we have the _init_weights(), which should be helpful. So I am wondering if the code should be like:
# load the pre-trained model
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large")
# reset a specific layer
model._init_weights(model.get_decoder().layers[n:n+1])
But I don’t think I make it correct because the fine-tuning result doesn’t change. Any ideas on this implementation? Thank you!
I am also stuck on this. Looking at the code of _init_weights, it looks like it expects individual modules like nn.Linear.
This would require looping over all the modules of your model that you would like to re-initialize and passing them to _init_weights. But this might not translate to a new model, as their layer structure could be different. Is there not a way to just re-initialize a whole layer? Or all modules under some component (e.g. BertLayer)?
Okay I think I have figured it out. You can recursively apply _init_weights to all submodules using apply. So, for example, if you wanted to re-initialize the last layer, the following should work:
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-uncased")
# Print the weights before and after the call to _init_weights to confirm they have be re-initialized
# print(model.encoder.layer[-1].attention.output.dense.weight)
model.encoder.layer[-1].apply(model._init_weights)
# print(model.encoder.layer[-1].attention.output.dense.weight)