Downloads Model Every Time

The model is “meta-llama/Llama-2-7b-chat-hf”

Im using the sample code, that queries the model with “I liked “Breaking Bad” and “Band of Brothers”…”

When the space runs it downloads the 9GB+ model, taking a while and then outputs a response.

But if I refresh the page or use the embed code the whole process of downloading the model starts again.

I am at a loss to understand why this happens, I have even upgraded to persistent storage.

1 Like

What space sdk are you using?

Streamlit.

I just read the section about setting the HF_HOME variable so restarting and trying that.

OK I now have the model persist on disk stopping it from having to be downloaded everytime.

Im running on an A10G and it seems that every time I refresh the page the model needs to be loaded into the GPU memory, which takes a few minutes.

Is there a way to persist the model in memory, I have set the gpu to sleep after an hour.

What was the specific setting you made? I’m having the same issue with the model downloading every time I restart.