Freeze Lower Layers with Auto Classification Model

I’ve been unsuccessful in freezing lower pretrained BERT layers when training a classifier using Huggingface. I’m using AutoModelForSequenceClassification particularly, via code below, and I want to freeze the lower X layers (ex: lower 9 layers). Is this possible in HuggingFace, and if so what code would I add to this for functionality?

tokenizer = AutoTokenizer.from_pretrained(“bert-base-cased”)

def tokenize_function(examples):
return tokenizer(examples[“text”], max_length = 512, padding=“max_length”, truncation=True)

tokenized_train = (tokenize_function, batched=True)
tokenized_test= (tokenize_function, batched=True)

model = AutoModelForSequenceClassification.from_pretrained(“bert-base-cased”, num_labels=1)

for w in model.bert.parameters():
w._trainable= False

training_args = TrainingArguments(“test_trainer”, evaluation_strategy=“epoch”, per_device_train_batch_size=8)
trainer = Trainer(model=model, args=training_args, train_dataset=tokenized_train, eval_dataset=tokenized_test)

1 Like

Yes, in PyTorch freezing layers is quite easy. It can be done as follows:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(“bert-base-cased”, num_labels=1)

for name, param in model.named_parameters():
     if name.startswith("..."): # choose whatever you like here
        param.requires_grad = False
1 Like

Thank you so much nielsr for the quick and useful reply. I believe I got this to work. So to verify, that can be written prior to “Trainer” command and will freeze any specified parameter? So for example, I could write the code below to freeze the first two layers.

for name, param in model.named_parameters():
if name.startswith(“bert.encoder.layer.1”):
param.requires_grad = False
if name.startswith(“bert.encoder.layer.2”):
param.requires_grad = False

This question shows my ignorance, but is there a way to print model settings prior to training to verify which layers/parameters are frozen?

To verify which layers are frozen, you can do:

for name, param in model.named_parameters():
     print(name, param.requires_grad)

Would just add to this, you probably want to freeze layer 0, and you don’t want to freeze 10, 11, 12 (if using 12 layers for example), so “bert.encoder.layer.1.” rather than “bert.encoder.layer.1” should avoid such things.

May I know for subsequent operations such as model.train and model.eval, does it change the the param.requires_grad that is specify above? Or do I have to do the above, everytime I change between training and eval mode?