So, after I check Memory Requirement and applied deepspeed stage 3 to my script. I tried to run my script to a pc with 2 GPU. But, somehow deepspeed creating one process per GPU and making the usage of RAM double the amount of memory requirement. I suspect because each process load it’s own model so each offloading its params and optimizer to CPU making the usage of RAM doubled.
Is there a way to force deepspeed only create one process but using 2 GPU instead of making one process for each of GPUS?