Downloads Model Every Time

mattsputnikdigital · August 24, 2023, 3:17pm

The model is “meta-llama/Llama-2-7b-chat-hf”

Im using the sample code, that queries the model with “I liked “Breaking Bad” and “Band of Brothers”…”

When the space runs it downloads the 9GB+ model, taking a while and then outputs a response.

But if I refresh the page or use the embed code the whole process of downloading the model starts again.

I am at a loss to understand why this happens, I have even upgraded to persistent storage.

freddyaboulton · August 24, 2023, 3:48pm

What space sdk are you using?

mattsputnikdigital · August 24, 2023, 3:59pm

Streamlit.

I just read the section about setting the HF_HOME variable so restarting and trying that.

mattsputnikdigital · August 25, 2023, 8:27am

OK I now have the model persist on disk stopping it from having to be downloaded everytime.

Im running on an A10G and it seems that every time I refresh the page the model needs to be loaded into the GPU memory, which takes a few minutes.

Is there a way to persist the model in memory, I have set the gpu to sleep after an hour.

dwipper · November 2, 2023, 10:52pm

What was the specific setting you made? I’m having the same issue with the model downloading every time I restart.

Topic		Replies	Views
meta-llama/Llama-2-70b-hf filling up my disk 🤗Transformers	0	351	August 2, 2023
Streamlit + Llama 3, takes too much gpu memory? Models	0	188	July 13, 2024
Downloaded models Beginners	14	2015	September 15, 2024
Change model download folder? Beginners	1	9629	October 17, 2023
Question need help Beginners	0	95	March 31, 2024