Optimizing Model Loading with a CPU Bottleneck

Thanks. Realized that was the problem.
At that point I didn’t have a infra that could take care of sharding it either.

I now was able to get access to some L40 which circumvents this problem, and should help with solving this problem.

I will probably work through sharding this as you suggested and probably keep it as my copy version of the model.

I was hoping I could check with maintainers to actually add the safetensors (which is still an open PR on it) and then the sharded version too.

1 Like