Why Tensorflow Models are way slower than Pytorch models, for autoregressive modeling?

I think it is because PyTorch is more awesome :hugs:

Just had a look at the example code, maybe the .to('cuda') call makes something much more faster :thinking: