T5 weird behavior between model.forward() and model.generate

loi · March 31, 2024, 9:04pm

This is the weirdest thing so far and cost me a few days of debugging. i think there is a bug in the T5 greedy search algorithm
During training:
The cross entropy loss becomes smaller and smaller, to the point where you do argmax(dim=2) on the logits, you get exactly the same result as the “labels”.
This means that when you feed the same input string into the model.generate(), you SHOULD get the same output.
However, the generate (inference) produce slightly different.

This is very annoying.

Topic		Replies	Views
T5 forward pass versus generate, latter outputs non-sense Beginners	8	2934	March 25, 2021
[Urgent] trainer.predict() and model.generate creates totally different predictions 🤗Transformers	4	6984	February 1, 2021
The output of T5 is not consistent on multiple sequences 🤗Transformers	1	879	May 11, 2022
T5 Model Generate and Model Outputs Vastly Different Beginners	2	848	August 19, 2025
Run tensorflow transformer T5 model with huggingface generate() function return bad reply 🤗Transformers	2	568	January 4, 2023

T5 weird behavior between model.forward() and model.generate

Related topics