Can T5 "forget" Appendix D tasks after fine-tuning?

TheLongSentance · September 11, 2021, 5:57am

I have been using the example code run_summarization.py (within using the "summarize: " prefix) to do some fine-tuning on a medical dataset based on the denoising patten from the huggingface website using sentinel tokens like <extra_id_0>, <extra_id_1> (see link). I was hoping to use some of the pre-processed tasks (such as "summarize: " or "question: " that were trained into T5 (in my case t5-large specifically)).

So I set up a Colab sheet to demo the Appendix D tasks from the T5 paper which works fine when using t5-large prior to fine-tuning. I do the fine-tuning on t5-large which seems to go ok (as much as I can tell - loss goes down, rouge scores go up, manual inspection shows reasonable predictions etc). Then I push to model back to the hub and attempt to use the fine-tuned model with the Appendix D tasks in my Colab sheet.

What I find is that my fine-tuned t5-large seems to have in some way “forgotten” the pre-processed tasks, and no longer answers any of them (no output text). So to check that the fine-tuned model was not completely corrupted (or some other screw up), I added an example denosing task using <extra_id_0>, <extra_id_1> etc as above, and the fine-tuned model completed this as expected. So the model is “working” but only recognises the task it was specifically fine-tuned upon.

So what is going on? This is my first project with Transformers so apologies below if any of my guesses are crazy:

Is this what can happen with T5 after the kind of fine-tuning I am attempting?
Is it related to the amount of fine-tuning attempted (I use around 100k text samples)?
Is it related to the learning rate being too high (I use constant 1e-4)?
Is it related to my own incompetence and I am completely missing something ?

I am investigating the problem, just if anyone has any experience of this then it would be great to hear from you!

Thanks in advance!

Topic		Replies	Views
Use Pretrained T5 for Summarization Beginners	3	636	July 2, 2021
T5 model for summarization far from SOTA results Models	0	1344	July 2, 2021
Finetune T5 with T5ForConditionalGeneration to multitask for Q&A and Summarization 🤗Transformers	0	636	November 28, 2023
Finetuning T5 for Summarisation - Poor results Intermediate	1	529	April 28, 2024
Can we fine-tune T5 for multiple tasks? 🤗Transformers	0	629	January 24, 2023

Can T5 "forget" Appendix D tasks after fine-tuning?

Related topics