T5 instruction finetuning

MLRadfys · September 9, 2024, 6:42pm

Hi all!

Iam trying to instuct-finetune a T5 model for harmful text classification.
Unfortunately Iam a little bit confused about the padding strategy that one should and if choosing a different one would make any difference.

Right now Iam training with padding = True (longest) set in the tokenizer.
To my understanding, padding = True pads each batch to the longest sequence in the batch.
I perform padding ‘on the fly’ in the collate method.

In a lot of T5 finetuning implementations though, the training dataset is tokenized as a preprocessing step with padding = max_length, which means all samples are padded to the same max length.

Now Iam wondering if there is any right or wrong here and if it makes any difference to use either of the two methods?

One thing I noticed is that using max_length padding instead of longest padding increases the training time quite a lot.

Thanks in advance for any advice,

Cheers,

M

Topic		Replies	Views
Padding strategy for classification Beginners	3	2496	July 20, 2020
Padding with pad_token_id improves results for T5? Beginners	2	3045	August 1, 2020
Bert strugling with Padded sentence 🤗Transformers	0	387	August 24, 2021
T5 tokenizer / ideal method of calculating max_sequence_length? 🤗Transformers	1	544	May 22, 2024
Padding side in instruction fine-tuning using SFTT 🤗Transformers	1	1682	December 9, 2024

T5 instruction finetuning

Related topics