How to Setup Deferred Init with Accelerate + DeepSpeed?

jasonkrone · April 12, 2024, 11:47pm

I’d like to defer initialization of my model until after DeepSpeed finishes sharding to avoid running OOM. Any tips on how to set that up?

I found the init_empty_weights method to help put the model on the “meta” device, but I’m not sure how to initialize the weights after calling accelerator.prepare in such a way that’s compatible with DeepSpeed.

In case it’s relevant, want to note I’m using a custom model not a huggingface model.

Topic		Replies	Views
Initialize model with empty weight causes OOM with offloading to disk Beginners	0	28	February 1, 2025
Conceptual question: Early loading of the model defeats the purpose of deepspeed! DeepSpeed	0	158	March 14, 2024
Exact difference between Transformers' and Accelerate's DeepSpeed integrations? DeepSpeed	5	808	February 13, 2024
When using DeepSpeed why do I need to pass dataloaders to the `accelerator.prepare`? 🤗Accelerate	2	3993	September 3, 2022
Deepspeed inference stage 3 + quantization DeepSpeed	0	984	March 8, 2024

How to Setup Deferred Init with Accelerate + DeepSpeed?

Related topics