Data preparation for T5 model

CRini · May 4, 2023, 4:39am

Hi there!

I want to further a T5 model and have been looking around and came across transformers/run_t5_mlm_flax.py at main · huggingface/transformers · GitHub

I am not sure which part of this huge script do I need for my data preparation. I have an input file that I want to process. I am trying to understand how to find the masking function that adds the extra_tokens_ids.
Is it better to try this whole script with my input rather than trying to pull out the data processing script?

Thank you would appreciate help!

nielsr · May 7, 2023, 5:36pm

Hi,

I think these lines are relevant for isolating the data preparation: transformers/run_t5_mlm_flax.py at ef42c2c487260c2a0111fa9d17f2507d84ddedea · huggingface/transformers · GitHub.

Topic		Replies	Views
Prepare data for pretraining T5 model 🤗Datasets	1	1073	May 4, 2023
T5/mT5 model distillation 🤗Transformers	1	966	December 25, 2023
Testing own T5 model 🤗Transformers	0	603	July 10, 2023
Example of how to pretrain T5? 🤗Transformers	15	16015	March 16, 2023
__call__() got an unexpected keyword argument 'special_tokens_mask' when running run_t5_mlm_flax.py Beginners	0	660	August 28, 2022

Data preparation for T5 model

Related topics