Example of how to pretrain T5?

We’ve released nanoT5 which is a minimal codebase that reproduces T5-model (similar to BART) pre-training in PyTorch (not Flax), using Huggingface.

You can take a look!

Any suggestions are more than welcome.