Fine-tuning T5 with custom datasets

TheLongSentance · August 3, 2021, 9:11am

I think I may have found a way around this issue (or at least the trainer starts and completes!). The subclassing of a torch.utils.data.Dataset object for the distilbert example in “Fine-tuning with custom datasets” needs changing as follows. I guess because the distilbert model provides just a list of integers whereas the T5 model has output texts and I assume the DataCollatorForSeq2Seq() takes care of preprocessing the labels (the output encodings) into the features needed by forward function of T5 model (I am guessing, but this is what I am assuming from what I have read). Code changes below:

Topic		Replies	Views
T5 fine-tuning for custom output Beginners	0	1084	January 22, 2022
Finetuning T5 on custom data Models	0	1063	November 13, 2020
Problem with transformer Trainer with torch CustomDataset, during fine-tuning Intermediate	3	488	September 12, 2024
How to fine-tune T5-base model? Beginners	10	4596	July 28, 2021
Fine-tuning T5 for translation Beginners	0	1305	November 9, 2021

Fine-tuning T5 with custom datasets

Related topics