Is it possible to load models onto GPUs without downloading to local in advance?

yqchen · July 11, 2024, 7:39am

We have a resource pool of GPU servers, and we are likely to deploy user-specified models on them. But in our scenario, servers might need to frequently change the model they serve.
In a common usage, one needs to download model weight to local disks before loading it on devices like GPUs. In our scenario above it might be inefficient and occupy too much space on disks.
So I’m really wondering if somehow it can be achieved that, we might be able to load model weight directly onto devices(like straightly pull from the hub, cache in memory, then load onto device)?
Thanks for your advice.

Topic		Replies	Views
Is it possible for multiple processes to access the weights of a single model at the same time? Models	0	132	April 17, 2024
Loading weights straight to GPU & Training support 🤗Accelerate	0	214	September 18, 2023
Cache large models on GPU instances between reboots Spaces	3	856	February 14, 2023
Loading model directly to GPU omitting RAM Beginners	6	65	March 28, 2025
How to load part of the model weight to inference? 🤗Accelerate	0	356	June 28, 2023

Is it possible to load models onto GPUs without downloading to local in advance?

Related topics