Why Tensorflow Models are way slower than Pytorch models, for autoregressive modeling?


I was experimenting with many models include GPT2, T5 etc. But it seems like Tensorflow models are too slow for same type of generation comparing to Tensorflow, whether it is greedy, beam etc .

Any specific reasons for this?


@patrickvonplaten @sgugger For a simple greedy decoding in gpt2 small tensorflow is taking 7 seconds, while pytorch is taking less than a second.

1 Like

@lysandre - Can anyone shed some light on this? will be great.

Are you measuring epoch time or total runtime? I have no direct experience with tensorflow but I remember that the set up of the graph might take quite a bit file on TF. you should probably only time the steps themselves and not the set up.

1 Like

I mean for inference. Here is the snippet. Something is wrong.
@thomwolf - Any thoughts? Thanks…

Maybe @jplu has some insights!

1 Like

I think it is because PyTorch is more awesome :hugs:

Just had a look at the example code, maybe the .to('cuda') call makes something much more faster :thinking:

.to(‘cuda’) is there , when I initialised. :slight_smile:
I expected a technical answer though, why tf is slower for generations.

@ huggingface team , which means tensorflow implementation are not suitable for production right as latency is higher. Can we conclude it that way?


The reason is because for now the TF models are not optimized for NLG, including the generate function, and we don’t recommend to use them for that task in production This is something we are working on, but cannot give you a specific date.

1 Like

I agree, you have to do a lot more things to optimize it. Especially the caching side.
It would be great if there was a warning while using tf generate. Thanks for the most valuable reply here @jplu. :slight_smile: