GPT2 working perfectly in local system, but doesn't generate text (stuck) when deployed in server

kristada673 · May 14, 2021, 10:17am

I have built a simple web app where the user inputs a few words and the length of the text to generate, and the model takes a minute to produce the results locally. But when I deploy it in Heroku, its taking forever (not displaying any results even after a couple of hours). You can check it out here.

Is it because the server’s CPU is too slow/weak? If yes, how do I use a faster CPU in Heroku, or can you suggest some other service instead of Heroku that would be better to deploy GPT2 based web apps? If not, what’s the issue and how do I fix it?

lewtun · May 17, 2021, 10:30am

hey @kristada673, looking at your code it seems that you load the model every time a user provides a prompt. a better approach would be to load the model once when the server spins up and then call it in a dedicated endpoint for prompting.

depending on your use case, a simpler alternative to heroku would be streamlit - you can find many examples online using GPT-2 with it (e.g. here)

raveenb · May 17, 2021, 11:42am

in addition to @lewtun suggestion try distil version of gpt2 it will run faster too

kristada673 · May 18, 2021, 2:35am

Thanks for the suggestion. In my local machine, it downloads the model only the first time; subsequently, it uses the downloaded model. Is the process not the same when deployed in heroku? If not, could you please show how to do this?

kristada673 · May 18, 2021, 2:37am

Thanks for the suggestion. I tried out ‘distilgpt2’, but the text it generates are of much inferior quality (understandably, as its a much smaller sized model).

lewtun · May 18, 2021, 11:24am

by default the model should be cached in ~/.cache/huggingface/ (see docs) so maybe you can inspect the heroku machine to see if that’s the case?

i still think the best approach would be to wrap you logic in a fastapi / flask app and then deploy that on heroku. this will allow you to separate the loading of the model from generating predictions and will be much faster. of course you can also try out streamlit which is easy to get started with

Topic		Replies	Views
Need Help on New Model Deployed in HuggingFace Models	1	20	April 24, 2025
How to run Text Generation model (GPT2) on Transformers-cli serve? 🤗Transformers	0	216	December 29, 2023
Host gpt2 model in a browser 🤗Transformers	1	588	January 19, 2021
Performance of hosted inference API Beginners	0	291	February 16, 2021
How can i deploy a hugging face model on flask application Models	0	746	December 22, 2023

GPT2 working perfectly in local system, but doesn't generate text (stuck) when deployed in server

Related topics