Why Tensorflow Models are way slower than Pytorch models, for autoregressive modeling?

I agree, you have to do a lot more things to optimize it. Especially the caching side.
It would be great if there was a warning while using tf generate. Thanks for the most valuable reply here @jplu. :slight_smile: