Different Inference Speed for same size models

Entj · August 29, 2021, 1:33am

Hi Everyone,
I was training two models one for sequence classification and the other on Token Classification.

The resultant two models are almost the same size, but the one for sequence classification seems to be x10 times faster on GPU and CPU.

I checked the loading method and made sure it’s the same, the model sizes, even the models are in eval state and used torch.no_grad()

But still nothing different.

If anyone has any explanation for that weird behavior please let me know.

Thanks a lot

Topic		Replies	Views
Speed expectations for production BERT models on CPU vs GPU? Beginners	1	2154	October 2, 2020
Inference time gets slower as dataset size increase 🤗Transformers	0	432	February 23, 2023
What's the best way to speed up inference on a large dataset? Beginners	3	3906	March 13, 2022
Baffling performance issue on most NVidia GPUs with simple transformers + pytorch code Intermediate	5	4512	April 9, 2024
Advice to speed and performance 🤗Transformers	4	7220	December 7, 2020