Example of how to pretrain T5?

mralexis · March 3, 2021, 8:27pm

Is there any codebase in huggingface that could be used to pretrain T5 model? Looking into the examples dir in the repo there is nothing mentioned about T5. Thanks!

mralexis · March 4, 2021, 6:11pm

Still need help on this…

lewtun · March 4, 2021, 6:26pm

Hi @mralexis, there’s a GitHub issue that might help you: How do I pre-train the T5 model in HuggingFace library using my own text corpus? · Issue #5079 · huggingface/transformers · GitHub

In particular T5ForConditionalGeneration is probably what you are looking for doing pretraining: T5 — transformers 4.3.0 documentation

mralexis · March 4, 2021, 6:39pm

@lewtun Thanks for the quick reply! I did check it out but there is only a code block on how to calculate the loss for pretraining but no other implementation details which are also critical. Do you know whether there is code on that?

lewtun · March 4, 2021, 6:52pm

Unfortunately I do not know where one can find a detailed example of T5 pretraining, so pinging @valhalla in case he does

valhalla · March 15, 2021, 7:41am

Hey guys, sorry about the super late response.

T5 pre-training is not implemented with Transformers, AFAIK it’s only available in the original T5 repo.
What we need to implement this with Transformers is the T5 style denoising dataset. It’s in my todo-list to implement this hopefully early next month.

stykat · April 2, 2021, 1:15pm

Hey! just checking in on that to see if anyone has any updates.
Thank you

ncoop57 · May 7, 2021, 8:30pm

I am also interested in this and actually have a semi working version (needs more testing) based on the original T5 repo. I’d be happy to work together on this to bring to the transformers library if it is still on the roadmap. Here is the colab with the current implementation: Google Colaboratory (scroll down/CTRL-F for DataCollatorForSeq2SeqMaskLanguageModeling

I can also open a PR to start this process if interested.

scienceapptest · June 11, 2021, 9:41pm

Any more developments here? My understanding is that we’d have to pre-train using the standard Trainer class with a custom Data Collator as described by @ncoop57. @valhalla would you be able to help/comment?

saichandra · September 21, 2021, 8:37am

Hi @valhalla , any update on this ?

nielsr · September 21, 2021, 9:45am

T5 pre-training is now supported in JAX/FLAX. You can check out the example script here: transformers/examples/flax/language-modeling at master · huggingface/transformers · GitHub. It actually includes 2 scripts:

t5_tokenizer_model.py, to train a T5 tokenizer (i.e. SentencePiece) from scratch.
run_t5_mlm_flax.py, to pre-train T5. It’s suited to run on TPUs (for which you can obtain access for free by applying to Google’s TFRC program).

@patrickvonplaten also demonstrates how to run the script in this video (starts around 13:35).

This script was developed for the JAX/FLAX community event. It would be really cool if someone contributes the PyTorch version of it. It would mean translating the script from FLAX to PyTorch, which is probably straightforward.

YiTian · January 13, 2022, 10:15am

Hi, I convert the parameters trained from JAX/FLAX to the pytorch version.
model = FlaxT5ForConditionalGeneration.from_pretrained(pretrained_path)
pt_model = T5ForConditionalGeneration.from_pretrained(tmp_path, from_flax=True)

However, some weights of T5ForConditionalGeneration were not initialized from the Flax model.
Here are the details.
All Flax model weights were used when initializing T5ForConditionalGeneration.
Some weights of T5ForConditionalGeneration were not initialized from the Flax model and are newly initialized: [‘decoder.embed_tokens.weight’, ‘encoder.embed_tokens.weight’, ‘lm_head.weight’]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I guess these three weights are shared.
So, I add three lines before saving the parameters.
pt_model.encoder.embed_tokens.weight.data = model.params[‘shared’][‘embedding’]._value
pt_model.decoder.embed_tokens.weight.data = model.params[‘shared’][‘embedding’]._value
pt_model.lm_head.weight.data = model.params[‘shared’][‘embedding’]._value
pt_model.save_pretrained(tmp_path)

Is this RIGHT?

StephennFernandes · March 28, 2022, 2:11pm

@lewtun @valhalla @nielsr @patrickvonplaten I am planing to pretrain multilingual T5 small and/or medium from scratch, i can across this post and the hugginface implementation for T5, my question is can i use the same pretraining script from T5 , by replace the T5Config with mT5Config ? WOULD THIS WORK ?

Also how should the dataset be arranged for multilingual languages pretraining ? should all the langages be arranged in a sequential order where a sequence of one lang followed by another eg: [French, German, Italian] or should all the languages be randomly shuffled ?

for the record i am planning to pretrain mT5 on indian languages on the oscar corpus and some additionally sourced text corpus.

AmineOueslati · April 17, 2022, 4:07pm

@StephennFernandes
Hi, did it work for you with mT5?

jessicalopez · October 17, 2022, 3:47pm

Hello @valhalla , do you have any updates? Thank you in advance

pnawrot · March 16, 2023, 4:31pm

We’ve released nanoT5 which is a minimal codebase that reproduces T5-model (similar to BART) pre-training in PyTorch (not Flax), using Huggingface.

You can take a look!

Any suggestions are more than welcome.

Topic		Replies	Views
Pre-training googlebyt5small 🤗Transformers	0	228	October 26, 2022
Training T5 on mlm task from scratch 🤗Transformers	4	3275	July 29, 2022
Prepare data for pretraining T5 model 🤗Datasets	1	1078	May 4, 2023
Prepare data to fine-tune T5 model on unsupervised objective 🤗Transformers	2	3930	November 3, 2021
Help addapting pytorch/text-classification example to t5 🤗Transformers	4	1240	May 25, 2023

Example of how to pretrain T5?

Related topics