How does the finetune on transformer (t5) work

Welcome! I’ll take a shot at answering this, but I’m not at expert at this so I may be wrong!

As far as I can understand, when you instantiate a model the weights are not frozen, so if you start finetuning on the model all parameters will be trainable. If you want to freeze weights, that’s something you’ll have to set manually, and the way that you do that will depend on what library (PyTorch, Tensorflow, or Flax) that you’re using.

When you use AutoModelForSeq2SeqLM (or any of the other AutoModelX) classes to instantiate a model with .from_pretrained, the backend that gets used is PyTorch. (As per Auto Classes) So once you’ve loaded the model (with the PyTorch backend), if you want to freeze all of the base model’s weights you can access them and freeze them with:

for param in model.base_model.parameters():
    param.requires_grad = False

If you were to use one of the Tensorflow or Flax auto-models, then you’d have to follow those libraries’ methods for freezing layers if that’s what you wanted to do.

I hope this helps!

3 Likes