I was training two models one for sequence classification and the other on Token Classification.
The resultant two models are almost the same size, but the one for sequence classification seems to be x10 times faster on GPU and CPU.
I checked the loading method and made sure it’s the same, the model sizes, even the models are in eval state and used torch.no_grad()
But still nothing different.
If anyone has any explanation for that weird behavior please let me know.
Thanks a lot