.to(‘cuda’) is there , when I initialised.
I expected a technical answer though, why tf is slower for generations.
@ huggingface team , which means tensorflow implementation are not suitable for production right as latency is higher. Can we conclude it that way?