Optimizing Model Loading with a CPU Bottleneck

I was hoping I could check with maintainers to actually add the safetensors (which is still an open PR on it) and then the sharded version too.

It seems that a PR has already been opened regarding that. All that’s needed is for the maintainer to merge it, but I guess it’s been forgotten…:sweat_smile: