I’d like to defer initialization of my model until after DeepSpeed finishes sharding to avoid running OOM. Any tips on how to set that up?
I found the init_empty_weights method to help put the model on the “meta” device, but I’m not sure how to initialize the weights after calling accelerator.prepare in such a way that’s compatible with DeepSpeed.
In case it’s relevant, want to note I’m using a custom model not a huggingface model.