Download most used models in container and load them when necessary

green-matteo · September 29, 2023, 2:12pm

Hello,

I’m writing this question to know if there is any better strategy than the one I’m thinking.

So, I have a service that should be able to run different huggingface models (for sentence embedding purpose). This service will be containerised (probably in docker) with a load balancer dealing with spawning more or less instances of it.

The service, as I said will have in it multiple models from the hub, and, to simplify, will receive some text to work on and the name of the model to use for it, in the 95% of the cases this model is one out of 5 models we already know, in the other 5% it’s something else:

My idea would be:

Initialise a dict with the 5 models:


for model_name in list_of_most_used_models:
     tokenizer_dict[model_name] = AutoTokenizer.from_pretrained(model_name)
     model_dict[model_name] = AutoModel.from_pretrained(model_name).to(self.device)

Then, everytime there is a new request with another model I do something like:

if model_name in model_dict:
     model_dict[model_name].do_something()
else:
     model_dict[model_name] = ....

My idea was to create the docker with the 5 models predownloaded (so that when the instance of the container is created they are already there (and i do not have to download them from the hub). My question is, if I “predownload” them, can I still call them like:

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-180b")

or do I have to do some operation to let AutoTokenizer know that he can use the local version? When I download them do I have to specify a particular directory?

More in general, is there a better way to do what I want or this should work quite well already?

Topic		Replies	Views
Manually Downloading Models in docker build with snapshot_download 🤗Transformers	2	16980	December 5, 2022
Prakash Hinduja Switzerland (Swiss) How do I load a pre-trained model in Hugging Face? Beginners	1	23	June 26, 2025
Load pre-trained models inside containerized pipeline for multi-lingual translation Intermediate	0	699	November 16, 2022
Cant load tokenizer using from_pretrained, `use_auth_token=True` error when token is being used Inference Endpoints on the Hub	7	7668	August 6, 2023
How can I use the models provided in huggingface.co/models? Beginners	3	1562	April 9, 2021

Download most used models in container and load them when necessary

Related topics