T5 Seq2Seq custom fine-tuning

I have 2 questions regarding fine-tuning t5:-

  1. Is there anyway to change the lm_head on T5ForConditionalGeneration to intiliaze it from scratch to support new vocabulary size ?
    I did it by changing the T5ForConditionalGeneration code and add a new layer called final_layer, but I was wondering if there is an easier way.

  2. Is T5 generate method use teacher forcing or not ?

When you modify the vocab, you also need to resize the the token embeddings. The right way to do this is

  1. Add the new tokens to the tokenizer
    tokenizer.add_tokens(list of new toknes)
  2. Resize token embeddings

teacher forcing is used while training. generate does not use teacher forcing since it’s not used in training and meant for generating after training.

Thanks @valhalla for your explanation.

To confirm my understanding.

  1. Resizing the embedding will add extra rows/columns for the new tokens, which is initialised with random weights, correct ?

  2. Seq2Seq example:
    Will use teacher forcing during training, is there anyway to disable teacher forcing in the library, or I have to implement it my self by feeding the model one output at a time sequentially ?

Here’s what I used to add some tokens:

from transformers import T5Tokenizer
from transformers import T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')

local_dir = "./cryptic_special"
model_name = "t5-small"

special_tokens = ["<DEFN>",

tokenizer_special = T5Tokenizer.from_pretrained(model_name, additional_special_tokens=special_tokens)

Then you just adapt the fine_tune script to point to the local_dir (for model and tokenizer)

  1. yes, extra embeddings will be initialised randomly.
  2. Don’t think so, you’ll need to implement it yourself.

Thanks a lot for the example.

Perfect, thanks for the explanation.

This didn’t work for me, how can you reload the model once you’ve resized the embedding?
The rest of the model resizes, but it seems the LM_HEAD will not, eg:

size mismatch for lm_head.weight: copying a param with shape 
torch.Size([32128, 768]) from checkpoint, the shape in current model is 
torch.Size([32102, 768])`

Disregard this, it was a bug that was fixed in: