T5 Seq2Seq custom fine-tuning

agemagician · October 12, 2020, 7:19pm

I have 2 questions regarding fine-tuning t5:-

Is there anyway to change the lm_head on T5ForConditionalGeneration to intiliaze it from scratch to support new vocabulary size ?
I did it by changing the T5ForConditionalGeneration code and add a new layer called final_layer, but I was wondering if there is an easier way.
Is T5 generate method use teacher forcing or not ?

valhalla · October 13, 2020, 7:14am

When you modify the vocab, you also need to resize the the token embeddings. The right way to do this is

Add the new tokens to the tokenizer
tokenizer.add_tokens(list of new toknes)
Resize token embeddings
model.resize_token_embeddings(len(tokenizer))

teacher forcing is used while training. generate does not use teacher forcing since it’s not used in training and meant for generating after training.

agemagician · October 13, 2020, 7:43am

Thanks @valhalla for your explanation.

To confirm my understanding.

Resizing the embedding will add extra rows/columns for the new tokens, which is initialised with random weights, correct ?
Seq2Seq example:
https://github.com/huggingface/transformers/blob/master/examples/seq2seq/seq2seq_trainer.py#L119
Will use teacher forcing during training, is there anyway to disable teacher forcing in the library, or I have to implement it my self by feeding the model one output at a time sequentially ?

jsrozner · October 13, 2020, 6:28pm

Here’s what I used to add some tokens:

from transformers import T5Tokenizer
from transformers import T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5ForConditionalGeneration.from_pretrained('t5-small')

local_dir = "./cryptic_special"
model_name = "t5-small"

special_tokens = ["<DEFN>",
                  "<ANAG>",
                  "<ANS>",
                  "<INDIC>"]

tokenizer_special = T5Tokenizer.from_pretrained(model_name, additional_special_tokens=special_tokens)
model.resize_token_embeddings(len(tokenizer_special))
tokenizer_special.save_pretrained(local_dir)
model.save_pretrained(local_dir)

Then you just adapt the fine_tune script to point to the local_dir (for model and tokenizer)

valhalla · October 14, 2020, 9:30am

yes, extra embeddings will be initialised randomly.
Don’t think so, you’ll need to implement it yourself.

agemagician · October 14, 2020, 10:28am

Thanks a lot for the example.

agemagician · October 14, 2020, 10:28am

Perfect, thanks for the explanation.

FL33TW00D · November 30, 2020, 10:01pm

This didn’t work for me, how can you reload the model once you’ve resized the embedding?
The rest of the model resizes, but it seems the LM_HEAD will not, eg:

size mismatch for lm_head.weight: copying a param with shape 
torch.Size([32128, 768]) from checkpoint, the shape in current model is 
torch.Size([32102, 768])`

Disregard this, it was a bug that was fixed in:

github.com/huggingface/transformers

[PyTorch] Refactor Resize Token Embeddings

huggingface:master ← patrickvonplaten:fix_t5_resize_tokens

opened 04:59PM - 01 Dec 20 UTC

patrickvonplaten

+273 -57

# What does this PR do?  This PR extends the `resize_embeddings` function in PyTorch to models that have input/output embeddings that are **not** tied. In PyTorch all models that have tied input/output embeddings by default can also untie those embeddings by setting `config.tie_word_embeddings=False`. This however requires the `_resize_token_embeddings` to be extended to also resize the `lm_head`. This PR does this extension by adding a `_get_resized_lm_head` method. Also, all models that have a `get_output_embedding()` function, now need a `set_output_embedding()` function. A test is added to make sure the new functionality works as expected. The Bart-like models currently skip this test because there is a rather weird `lm_head` behavior that I want to refactor in another PR. In addition this PR: - Fixes #8706: With MT5 and T5v1_1, T5 now has a configuration where input and output embeddings are not tied anymore. This PR fixes this. - Refactors MobileBert ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/master/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors which may be interested in your PR.

Topic		Replies	Views
Problem generating with T5ForConditionalGeneration on a custom task 🤗Transformers	2	41	January 26, 2025
Errors when fine-tuning T5 Beginners	7	6470	January 3, 2022
Add_tokens + finetune 🤗Transformers	0	522	February 25, 2022
T5 for conditional generation: getting started Beginners	20	18563	July 19, 2023
Modify HF model for training Intermediate	1	377	December 22, 2023

T5 Seq2Seq custom fine-tuning

Related topics