Cache large models on GPU instances between reboots


I’m using some A10G instances to run a 20GB model in a private Space, and I’ve got the Space set to shut down after 15 minutes of no use to save $.

Each cold-start though takes a few minutes to re-download all model weights from the hub, which is a bit of a pain to wait for and I’m sure an annoying amount of bandwidth for huggingface.

Is there any method of caching (or similar) these model weights, or a recommended way to load/store them that doesn’t require the space to reload them each time?


Hi @alexedw,

There is no solution as of now to cache anything in between reboots of a Space.

The only workaround currently would be to use a Space with a Docker SDK and download the weights as a step of the image build.


Do you have an example of what a Dockerfile would look like to cache the weights?

Thanks @chris-rannou

Hi @brianjking,

A minimal example could be, using the transformers cli:

FROM python:3.8-slim-buster

# Set up a new user named "user" with user ID 1000
RUN useradd -m -u 1000 user

# Switch to the "user" user
USER user

# Set home to the user's home directory
RUN mkdir -p $HOME/app

ENV HOME=/home/user \

# Set the working directory to the user's home directory

# install transformers cli
RUN pip install transformers

# Download model weights
RUN ["transformers-cli", "download", "distilbert-base-cased-distilled-squad"]