How does the finetune on transformer (t5) work

mab · April 7, 2022, 2:42pm

I am using pytorch lightning to finetune t5 transformer on a specific task. However, I was not able to understand how the finetuning works. I always see this code :

tokenizer = AutoTokenizer.from_pretrained(hparams.model_name_or_path)
model = AutoModelForSeq2SeqLM.from_pretrained(hparams.model_name_or_path)

I don’t get how the finetuning is done, are they only using fully connected layer on top of the t5 base model being trainable or using fully connected layer on top of the model being non-trainable (frozen) ?

NimaBoscarino · April 8, 2022, 11:24pm

Welcome! I’ll take a shot at answering this, but I’m not at expert at this so I may be wrong!

As far as I can understand, when you instantiate a model the weights are not frozen, so if you start finetuning on the model all parameters will be trainable. If you want to freeze weights, that’s something you’ll have to set manually, and the way that you do that will depend on what library (PyTorch, Tensorflow, or Flax) that you’re using.

When you use AutoModelForSeq2SeqLM (or any of the other AutoModelX) classes to instantiate a model with .from_pretrained, the backend that gets used is PyTorch. (As per Auto Classes) So once you’ve loaded the model (with the PyTorch backend), if you want to freeze all of the base model’s weights you can access them and freeze them with:

for param in model.base_model.parameters():
    param.requires_grad = False

If you were to use one of the Tensorflow or Flax auto-models, then you’d have to follow those libraries’ methods for freezing layers if that’s what you wanted to do.

I hope this helps!

mab · April 8, 2022, 11:50pm

Thank you so much for your answer, it is a great help.
By the way, I am trying to add another fully connected layer to the head. Would you know how to do it?
I figured that I can go to the source code library and change the modeling_t5.py file.
it has the lm_head defined as
self.lm_head = nn.Linear(config.d_model, config.vocab_size, bias=False).
Thus, I thought if I make it as below, does it make sense, and do you think it is the right way of doing it?
self.lm_head = nn.Sequential( nn.Linear(config.d_model, config.vocab_size, bias=False), nn.Flatten(), nn.Linear(config.d_model, config.vocab_size, bias=False), )

NimaBoscarino · April 11, 2022, 5:29pm

Ah I’m definitely not the right person to answer this, but I think you should be able to just alter the model.lm_head to something like:

model.lm_head = nn.Sequential(
    nn.Linear(in_features=model.lm_head.in_features, out_features=<SOMENUMBER>, bias=False),
    nn.Linear(in_features=<SOMENUMBER>, out_features=model.lm_head.in_features, bias=False)
)

And you can add other stuff in there too I think, as long the final output layer matches the expected dimensions (unless you want to change that too How do I change the classification head of a model? - #19 by nielsr)

If you’re just adding layers to the head I don’t think that you need edit the source code. If you needed to change stuff within the network, e.g. to make changes to the T5Block, I think that’s when you would dive into the source code.

I’m curious to learn more about this though, so do update this post with anything new that you learn about this!

Topic		Replies	Views
Errors when fine-tuning T5 Beginners	7	6021	January 3, 2022
I could not able to use save_pretrained on my T5 Model 🤗Transformers	3	1013	October 25, 2021
Freezing mt5 model for fine-tuning Models	1	444	July 15, 2023
Issues in finetuning t5-large model 🤗Transformers	1	421	April 25, 2023
Training loss is zero from the first step and model generation is empty after training? 🤗Transformers	0	308	February 8, 2024

How does the finetune on transformer (t5) work

Related Topics