Hey,
I am trying to figure out how to freeze layers of a model and read that I had to use
for param in model.base_model.parameters():
param.requires_grad = False
if I wanted to freeze the encoder of a pretrained MLM for example. But how do I use this with the Trainer?
I tried the following:
from transformers import BertTokenizer, BertForMaskedLM. LineByLineTextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
for param in model.base_model.parameters():
param.requires_grad = False
dataset = LineByLineTextDataset(
tokenizer=tokenizer,
file_path=in_path,
block_size=512,
)
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)
training_args = TrainingArguments(
output_dir=out_path,
overwrite_output_dir=True,
num_train_epochs=25,
per_device_train_batch_size=48,
save_steps=500,
save_total_limit=2,
seed=1
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset
)
trainer.train()
If the encoder was frozen I would expect it to produce the same outputs as a fresh instance of the pretrained encoder, but it doesn’t:
model_fresh = BertForMaskedLM.from_pretrained('bert-base-uncased')
inputs = tokenizer("This is a boring test sentence", return_tensors="pt")
torch.all(model.bert(**inputs)[0].eq(model_fresh.bert(**inputs)[0]))
--> tensor(false)
So I must be doing somethin wrong here, I guess the Trainer is reseting the requires_grad attribute and I have to overwrite it somehow after I instanciated the trainer?
Thanks in advance!
Johannes