I am trying to freeze deberta-v3-small
layers. I first froze 4 blocks with:
NUM_FROZEN_LAYERS = 83 # <--- This index correspond to last layer of block 4
for i,(name, param) in enumerate(list(model.named_parameters())\
[0:NUM_FROZEN_LAYERS]):
param.requires_grad = False
This works fine. However, I later wanted to freeze 6 blocks (up to layer number 115) and the following error was raised:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I played around and found out that after layer number 103 this error starts appearing. This layer is model.encoder.layer.6.attention.self.value_proj.weight
.
This looks very random. Does anyone know why is this happening and if there is another way to freeze certain blocks? Maybe I am not doing it the right way.