T5 for conditional generation: getting started

Hi, I have as specific task for which I’d like to use T5.

Inputs look like
some words <SPECIAL_TOKEN1> some other words <SPECIAL_TOKEN2>

Training Outputs are a certain combination of the (some words) and (some other words). The goal is to have T5 learn the composition function that takes the inputs to the outputs, where the output should hopefully be good language.

I was hoping to find an example script that I could modify. In particular I need a little help understanding how to do these parts:

  1. When generating the input files (i.e. the mapping from input_str to output_str) what is the best format (e.g. a tsv for input and a tsv for output with a 1:1 mapping by line)?

  2. Add special tokens to the vocab. Assuming that my inputs have special tokens and in the input files, then to make the model recognize them, I think I should use something like transformers.T5Tokenizer(additional_special_tokens=[, ]). Is this correct?

  3. Additional input processing: I think I need to somehow prepend a new “task tag” to all the input-output pairs. Where would I specify this new task name?

  4. Do I need to register this task somewhere so that it can actually be executed? Some of the examples I saw seem to suggest that I do. And do I need to choose a loss function for my new task? (If I don’t, will one be selected automatically?)

  5. Any tips for the loss function? I care about the outputs being syntactical /grammatical, but I would also like the model to learn the relative positional relations of the inputs.

For example, if I had something like
a b c , the model might learn that abc, bac, cab, or cba are valid (i.e. in this case “a” and “b” must always be adjacent), and would choose the sequence that is most probable under the language model.


1 Like
  1. You can choose whatever format that works well for you, only thing to note is your dataset or collatorshould return input_ids, attention_mask and labels.

  2. To add new tokens


# resize the embeddings
  1. Using task prefix is optional.
  2. No, you won’t need to register the task, the original T5 repo requires that but it’s not required here.

You might find these two notebooks useful

  1. Fine-tune T5 for Classification and Multiple Choice
  2. Fine-tune T5 for Summarization
  3. Train T5 on TPU

Note: These notebooks manually add the eos token (</s>), but it’s not with the current version, the tokenizer will handle that.

Here’s a great thread on tips and tricks for T5 fine-tuning

1 Like

Thank you!

The linked notebooks seem to do a lot of boilerplate work that is now handled in the seq2seq finetune.py script?

After spending a few hours reading through code, it seems like I should be able to finetune by just

  1. updating the tokens as you’ve described
  2. creating a dataset (train.source, train.target, etc)
  3. running finetune.sh with those as parameters

Is this correct?

Yes, most of the code is similar to finetune.py

To use finetune.py for your use case without modifying it -

  1. add the tokens,
  2. resize the embeddings of the model
  3. save this model and pass it finetune.py

Cool, yes this is what I was able to accomplish!

What would be a good reason for not re-using finetune.py? Or in the case that someone wants more customization, it would make sense to just tweak the finetune.py file?

Yesterday it took me a while to find the directory examples/seq2seq. I’m curious: why aren’t the modules (e.g. SummarizationModule) available in a higher level transformers directory? (e.g. something like transformers/seq2seq or transformers/LMgeneration)?

You can use finetune.py as it is or modify as required, no strong reason to not use it. The shared notebooks were written before finetune.py.

all examples can be found under examples/

In case anyone else comes across this thread, I adopted a self-contained T5 finetune example that doesn’t use lightning.

T5 Finetune on github


I used your GitHub code for finetune the T5 for text generation.

I have a issue of partially generating the output.
I don’t know why the output is cropped.

For example this is the generated text:
“< pad > Kasun has 7 books and gave Nimal 2 of the books. How many book did Ka”

This is the full output.

  • This is not the full sentence and not end with < /s > token.

  • Output are maximum 17 words. Giving model.generate( . . . , min_length=30 ) is also not solved this issue.

  • only small amount of word contains sentence are giving full output sentence

Parameters I used:
max_src_len = 200
max_tgt_len = 512

Is that cause because of a parameter value which I pass to tokenizer or decoder?

You should add max_length=None to your model.generate() call, I think. If that doesn’t work, try max_length=500 or something and see if generations are longer. I think you should also set min_length=None.

The reason is that T5forConditionaGeneration I think loads a config file at some point that specifies these parameters. You can see default value at transformers/generation_utils.py at master · huggingface/transformers · GitHub

So if you want to see what the model is being loaded with when we do .frompretrained(), call print(model.config). I think we’ll see that the default is max_length=20, which would be causing your problem. Set both max length and min length to None, and then the model will stop only when EOS token is the most probable output.

I think you could also directly modify some of these config parameters at load, e.g. by model.config.max_length = new_value, rather than doing it at the generation call.


@jsrozner Thank you so much.

Hi @Savidhu, does it actually work for you? Setting min_length / max_length doesn’t seem to change the output for me.

1 Like

@pafitis Yes that actually worked.
I changed the configuration as @jsrozner mentioned.

My settings are:

* Max_length = None
* Min_length = None
for the model.generate()

* model.config.max_length = 50

You need to change model.config.max_length also.

1 Like

Hey all, I have been trying to finetune T5 on XSum and I am getting constant validation loss. It doesn’t change at all. The training loss varies a but doesn’t converge like it stays in the range [10.0, 12.0]. I tried many methods like creating my own nn.Module which compatible with Trainer(), etc but none worked.
Link to colab (first version where I used default Trainer()).

Can anyone share a colab link or wandb project for my reference?



I have a Colab notebook on how to fine-tune T5 on a custom dataset here: GitHub - NielsRogge/Transformers-Tutorials: This repository contains demos I made with the Transformer

Let me know if it helps.

I am finetuning a T5 model for QA on my dataset but the vocab is so different than the tokenizer’s, which results in an excessive length of token_ids/tokens. can I train a new tokenizer from the existing one and use it for finetuning? if yes, any tips/resources to aid?