I wonder if there are any problems specific to T5…?
Or maybe the learning rate is too low.
Or maybe something is wrong and the model is not actually learning anything. (I saw a case like that a few days ago, but I can’t find it…)
There was also a post that mentioned bleu.