Optimizing Model Loading with a CPU Bottleneck

Accelerate Utilities

I think that’s the correct workaround. If you want to treat VRAM, RAM, and disk as a single entity, you should use the Accelerate library.:sweat_smile:

Environments where RAM is less than VRAM are not commonly expected, so this issue is not often reported, but excessive RAM consumption during model loading can occasionally become an issue.