Thanks,
Also, I have another minor question about T5 which could be obvious!
Does the embedding of words change during supervised fine-tuning of the model? I mean is it the change in embedding of words what the model learns from examples, or the learning is stored in another parts of the model?