I have built a simple web app where the user inputs a few words and the length of the text to generate, and the model takes a minute to produce the results locally. But when I deploy it in Heroku, its taking forever (not displaying any results even after a couple of hours). You can check it out here.
Is it because the server’s CPU is too slow/weak? If yes, how do I use a faster CPU in Heroku, or can you suggest some other service instead of Heroku that would be better to deploy GPT2 based web apps? If not, what’s the issue and how do I fix it?
hey @kristada673, looking at your code it seems that you load the model every time a user provides a prompt. a better approach would be to load the model once when the server spins up and then call it in a dedicated endpoint for prompting.
depending on your use case, a simpler alternative to heroku would be streamlit - you can find many examples online using GPT-2 with it (e.g. here)
in addition to @lewtun suggestion try distil version of gpt2 it will run faster too
Thanks for the suggestion. In my local machine, it downloads the model only the first time; subsequently, it uses the downloaded model. Is the process not the same when deployed in heroku? If not, could you please show how to do this?
Thanks for the suggestion. I tried out ‘distilgpt2’, but the text it generates are of much inferior quality (understandably, as its a much smaller sized model).
by default the model should be cached in
~/.cache/huggingface/ (see docs) so maybe you can inspect the heroku machine to see if that’s the case?
i still think the best approach would be to wrap you logic in a fastapi / flask app and then deploy that on heroku. this will allow you to separate the loading of the model from generating predictions and will be much faster. of course you can also try out streamlit which is easy to get started with