I know it would vary a bit depending on the model but how long should it take to execute a sentiment analysis on 1 million Tweets? Thanks!
It does depend on the model. It also depends quite a lot on the hardware. The answer might be a few seconds for distilbert-base-uncased-finetuned-sst-2-english running preloaded on an 8xA100 rig. Or it might take weeks or months to do it on a desktop CPU with GPT-Neox-20B.
You can test this on your hardware and model by going through all the setup, running through, say, 100 tweets to warm things up, start a timer in the code, run another 100 tweets, see how much time has elapsed, and then multiply that by 10,000. That will give you a fair estimate without too much bias from the one-off setup time and cache warming.
Right I’ve been doing what you have described mostly on Google colab. Taking about 4 hours per million. Not sure if it could be better…