How to correctly freeze some of the Wav2Vec2-Bert's layers?

LevonHakobyan · July 24, 2024, 9:14pm

Hi everyone!

I’m following the following blog post for fine tuning the W2V2-Bert for a low resource language - Fine-Tune W2V2-Bert for low-resource ASR with 🤗 Transformers .

In the training phase I attempted to freeze some of the layers, say first 21 out of 23 using the following piece of code:

for name, param in model.named_parameters():
  if (name not in ['lm_head.bias', 'lm_head.weight']) and ("encoder.layers.23" not in name) and ("encoder.layers.22" not in name):
          param.requires_grad = False

However, what I’m seeing is that only the language modelling head is getting trained and none of the layers get any updates, not even layers 22-23 that has requires_grad = True.

One easy work around that I found was to pass the optimizer two groups of parameters. Layers 1-21 in one group having 0 learning rate and layers 22-23 in the second having 2e-5 learning rate (I’m using constant scheduler). Another way was to pass only the second set of parameters to the optimizer. However, in both cases my optimizer is calculating the differentials with respect to all the parameters and is wasting compute.

The code above works for simple feed forward neural networks using pytorch. So I was wondering if this is an issue with transformers library? If not can anyone tell me now to properly freeze layers ?

Thanks!
@sgugger

Topic		Replies	Views
How to correctly freeze some of the Wav2Vec2-Bert’s layers? Intermediate	0	123	July 30, 2024
How to freeze some layers of BertModel Beginners	8	17533	August 25, 2022
How to freeze layers while fine-tuning? 🤗Transformers	2	169	May 16, 2025
Freezing layers when using gradient checkpointing 🤗Transformers	0	709	March 20, 2022
How to freeze layers using trainer? Beginners	11	31963	May 26, 2024

How to correctly freeze some of the Wav2Vec2-Bert's layers?

Related topics