Optimizing Model Loading with a CPU Bottleneck

Abhishek-P · August 17, 2025, 2:24am

Thanks. Realized that was the problem.
At that point I didn’t have a infra that could take care of sharding it either.

I now was able to get access to some L40 which circumvents this problem, and should help with solving this problem.

I will probably work through sharding this as you suggested and probably keep it as my copy version of the model.

I was hoping I could check with maintainers to actually add the safetensors (which is still an open PR on it) and then the sharded version too.

Topic		Replies	Views
Loading model directly to GPU omitting RAM Beginners	6	84	March 28, 2025
General question about large model loading 🤗Accelerate	2	930	November 28, 2024
Accelerate use of memory 🤗Transformers	1	117	February 7, 2025
Why am I out of GPU memory despite using device_map="auto"? 🤗Accelerate	3	18416	March 18, 2024
Can't load huge model onto multiple GPU's Beginners	5	5266	June 15, 2023