To avoid re-downloading the models every time my docker container is started, I want to manually download the models during building the docker image. I wrote a small script that runs the following to download the models during the build:
AutoTokenizer.from_pretrained(configs.get("models_names.tockenizer")) SentenceTransformer(configs.get("models_names.sentence_embedding")) AutoModelForSeq2SeqLM.from_pretrained(configs.get("models_names.paraphraser"))
While this works in theory, it breaks in practice because the models are not just downloaded, they are also loaded into memory, and that raises an out of memory error on the build machine. I then read about
snapshot_download (ref) and replaced my script with
While these two lines do download the same files, transformers is not able to load the models and attempts to download them whenever a
from_pretrained call is made. It remains the case even if I explicitly
TRANSFORMERS_CACHE to point to the cache directory of HuggingFace hub.
I noticed that the structure of the caches generated by
snapshot_download is different from the cache structure of transformers using
from_pretrained. Is that why it’s not working?
In general, what is the best way to pre-download the models during the build phase?