I am using pytorch lightning to finetune t5 transformer on a specific task. However, I was not able to understand how the finetuning works. I always see this code :
tokenizer = AutoTokenizer.from_pretrained(hparams.model_name_or_path)
model = AutoModelForSeq2SeqLM.from_pretrained(hparams.model_name_or_path)
I don’t get how the finetuning is done, are they only using fully connected layer on top of the t5 base model being trainable or using fully connected layer on top of the model being non-trainable (frozen) ?
Welcome! I’ll take a shot at answering this, but I’m not at expert at this so I may be wrong!
As far as I can understand, when you instantiate a model the weights are not frozen, so if you start finetuning on the model all parameters will be trainable. If you want to freeze weights, that’s something you’ll have to set manually, and the way that you do that will depend on what library (PyTorch, Tensorflow, or Flax) that you’re using.
When you use AutoModelForSeq2SeqLM (or any of the other AutoModelX) classes to instantiate a model with .from_pretrained, the backend that gets used is PyTorch. (As per Auto Classes) So once you’ve loaded the model (with the PyTorch backend), if you want to freeze all of the base model’s weights you can access them and freeze them with:
for param in model.base_model.parameters():
param.requires_grad = False
If you were to use one of the Tensorflow or Flax auto-models, then you’d have to follow those libraries’ methods for freezing layers if that’s what you wanted to do.
Thank you so much for your answer, it is a great help.
By the way, I am trying to add another fully connected layer to the head. Would you know how to do it?
I figured that I can go to the source code library and change the modeling_t5.py file.
it has the lm_head defined as self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False).
Thus, I thought if I make it as below, does it make sense, and do you think it is the right way of doing it? self.lm_head = nn.Sequential( nn.Linear(config.d_model, config.vocab_size, bias=False), nn.Flatten(), nn.Linear(config.d_model, config.vocab_size, bias=False), )
If you’re just adding layers to the head I don’t think that you need edit the source code. If you needed to change stuff within the network, e.g. to make changes to the T5Block, I think that’s when you would dive into the source code.
I’m curious to learn more about this though, so do update this post with anything new that you learn about this!