Conceptual question: Early loading of the model defeats the purpose of deepspeed!

marouen · March 14, 2024, 1:13am

From my understanding, one of the uses of deepspeed is to allocate the model across different GPUs, meaning avoid to load the model on a single GPU RAM + CPU RAM and distribute it instead.
Lets take a case where a single machine cannot load the model (including GPU and CPU RAMs) but 2 or more can load the model in combination:
Since, deepspeed is integrated into the pipeline after the model is loaded. i.e. after running AutoModelForCausalLM.from_pretrained leading to an OOM error, doesnt it defeat the purpose of using deepspeed in this case?
Is there an alternative way to load the model in such cases (except downgrading precision)?

Topic		Replies	Views
Manual pipeline parallelization with DeepSpeed DeepSpeed	0	757	January 7, 2023
Besides writing your own training loop, is there any other advantage for using it with deepspeed? 🤗Accelerate	2	582	July 4, 2023
Model parallel with deepspeed integration Beginners	0	639	September 14, 2021
Accelerate DeepSpeed integration vs DeepSpeed 🤗Accelerate	1	223	April 15, 2024
How to Setup Deferred Init with Accelerate + DeepSpeed? 🤗Accelerate	0	194	April 12, 2024

Conceptual question: Early loading of the model defeats the purpose of deepspeed!

Related topics