Data preparation for T5 model

Hi there!

I want to further a T5 model and have been looking around and came across transformers/run_t5_mlm_flax.py at main · huggingface/transformers · GitHub

I am not sure which part of this huge script do I need for my data preparation. I have an input file that I want to process. I am trying to understand how to find the masking function that adds the extra_tokens_ids.
Is it better to try this whole script with my input rather than trying to pull out the data processing script?

Thank you would appreciate help!

Hi,

I think these lines are relevant for isolating the data preparation: transformers/run_t5_mlm_flax.py at ef42c2c487260c2a0111fa9d17f2507d84ddedea · huggingface/transformers · GitHub.