I am trying to figure out how to freeze layers of a model and read that I had to use
for param in model.base_model.parameters(): param.requires_grad = False
if I wanted to freeze the encoder of a pretrained MLM for example. But how do I use this with the Trainer?
I tried the following:
from transformers import BertTokenizer, BertForMaskedLM. LineByLineTextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments model = BertForMaskedLM.from_pretrained('bert-base-uncased') tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') for param in model.base_model.parameters(): param.requires_grad = False dataset = LineByLineTextDataset( tokenizer=tokenizer, file_path=in_path, block_size=512, ) data_collator = DataCollatorForLanguageModeling( tokenizer=tokenizer, mlm=True, mlm_probability=0.15 ) training_args = TrainingArguments( output_dir=out_path, overwrite_output_dir=True, num_train_epochs=25, per_device_train_batch_size=48, save_steps=500, save_total_limit=2, seed=1 ) trainer = Trainer( model=model, args=training_args, data_collator=data_collator, train_dataset=dataset ) trainer.train()
If the encoder was frozen I would expect it to produce the same outputs as a fresh instance of the pretrained encoder, but it doesn’t:
model_fresh = BertForMaskedLM.from_pretrained('bert-base-uncased') inputs = tokenizer("This is a boring test sentence", return_tensors="pt") torch.all(model.bert(**inputs).eq(model_fresh.bert(**inputs))) --> tensor(false)
So I must be doing somethin wrong here, I guess the Trainer is reseting the requires_grad attribute and I have to overwrite it somehow after I instanciated the trainer?
Thanks in advance!