T5 weird behavior between model.forward() and model.generate

This is the weirdest thing so far and cost me a few days of debugging. i think there is a bug in the T5 greedy search algorithm
During training:
The cross entropy loss becomes smaller and smaller, to the point where you do argmax(dim=2) on the logits, you get exactly the same result as the “labels”.
This means that when you feed the same input string into the model.generate(), you SHOULD get the same output.
However, the generate (inference) produce slightly different.

This is very annoying.