Fine tuning T5 Encoder and T5 Decoder separately

Is it possible to create instances for T5Encoder and T5Decoders separately. I would like to fine tune T5Encoder with Masked Lang Objective on company specific dataset and then use it with T5Decoder for text generation.
I have tried and searched exclusively on Huggingface for these classes and/or their documentation but could not find so.

Hey (@ND1 ), did you find anything on this?.. I tried looking at the same but couldn’t find anything. I mean, we can extract the T5 encoder and decoder separately. However, trying to tune them with AutoModelForMaskedLM is not possible I guess.

EDIT…

I think I was right. T5 is a Seq-2-Seq model, which means that for language modelling at least, it needs to be trained in that fashion.

Another reason why you can’t train the encoder/decoder separately for LM is that, for the encoder, you’d likely want to use AutoModelForMaskedLM but T5 is not supported by this class. This makes sense if you think about it.

MLM models like BERT use [MASK] tokens whereas T5’s encoder uses span replacement tokens which are slightly different than [MASK].

So, I would recommend using this script: transformers/examples/flax/language-modeling/run_t5_mlm_flax.py at main · huggingface/transformers · GitHub

which will get you up and running with training T5 in S2S mode. Then, you can throw away the decoder and use the encoder for downstream tasks.

Hope this helps :slight_smile: