If the encoder was frozen I would expect it to produce the same outputs as a fresh instance of the pretrained encoder, but it doesn’t:
model_fresh = BertForMaskedLM.from_pretrained('bert-base-uncased')
inputs = tokenizer("This is a boring test sentence", return_tensors="pt")
torch.all(model.bert(**inputs)[0].eq(model_fresh.bert(**inputs)[0]))
--> tensor(false)
So I must be doing somethin wrong here, I guess the Trainer is reseting the requires_grad attribute and I have to overwrite it somehow after I instanciated the trainer?
Looking at the source code of BertForMaskedLM, the base model is the “bert” attribute, not the “base_model” attribute. So if you want to freeze the parameters of the base model before training, you should type
for param in model.bert.parameters():
param.requires_grad = False
@nielsrbase_model is an attribute that will work on all the PreTraineModel (to make it easy to access the encoder in a generic fashion)
The Trainer puts your model into training mode, so your difference might simply come from that (there are dropouts in the model). You should check if putting it back in eval mode solves your problem.
yeah, this is because you are using roberta instead of bert, therefore it uses .roberta to store the encoder. I believe there is some model independant keyword like “base_model” or something, but I dont know right now (im on vacation, but maybe you can try or google it). Hope that helps!