Can T5 "forget" Appendix D tasks after fine-tuning?

I have been using the example code run_summarization.py (within using the "summarize: " prefix) to do some fine-tuning on a medical dataset based on the denoising patten from the huggingface website using sentinel tokens like <extra_id_0>, <extra_id_1> (see link). I was hoping to use some of the pre-processed tasks (such as "summarize: " or "question: " that were trained into T5 (in my case t5-large specifically)).

So I set up a Colab sheet to demo the Appendix D tasks from the T5 paper which works fine when using t5-large prior to fine-tuning. I do the fine-tuning on t5-large which seems to go ok (as much as I can tell - loss goes down, rouge scores go up, manual inspection shows reasonable predictions etc). Then I push to model back to the hub and attempt to use the fine-tuned model with the Appendix D tasks in my Colab sheet.

What I find is that my fine-tuned t5-large seems to have in some way “forgotten” the pre-processed tasks, and no longer answers any of them (no output text). So to check that the fine-tuned model was not completely corrupted (or some other screw up), I added an example denosing task using <extra_id_0>, <extra_id_1> etc as above, and the fine-tuned model completed this as expected. So the model is “working” but only recognises the task it was specifically fine-tuned upon.

So what is going on? This is my first project with Transformers so apologies below if any of my guesses are crazy:

  • Is this what can happen with T5 after the kind of fine-tuning I am attempting?
  • Is it related to the amount of fine-tuning attempted (I use around 100k text samples)?
  • Is it related to the learning rate being too high (I use constant 1e-4)?
  • Is it related to my own incompetence and I am completely missing something ? :wink:

I am investigating the problem, just if anyone has any experience of this then it would be great to hear from you!

Thanks in advance!