Is CPU-offloading function in accelerate same with deepSpeed?

Hi, I’m using the Accelerate framework to offload the weight parameters to CPU DRAM for DNN inference.

To achieve this, I’m referring to Accelerate’s device_map, which can be found at this link.
Handling big models for inference.

However, I recently came across another document discussing DeepSpeed’s Zero-3 offload, which seems to offer a similar function.

I’m wondering if these two approaches are the same or if there are any differences between them.
Specifically, am I using DeepSpeed just giving device_map when calling the pretrained model?

Hello, no, they are both different. device_map is doing naive pipelining (different layers on different GPUs/CPU RAM/disk) while DeepSpeed does parameter+optimizer+gradient sharding across GPUs and then offloading those to partitions to CPUs. DeepSpeed Z3 is generally used for training. Accelerate’s device_map is generally used for big model inference.

1 Like

Oh yes, I know there’s a far more difference between just offloading parameters from GPU to CPU when training.
But I’m just using it within inference execution.
As far as I know, there’s no more optimization on DeepSpeed ZeRO-3 just offloading parameters to CPU DRAM, so I thought those two are the same. Isn’t it?

During inference too, there is clear difference between ZeRO-3 and device_map/naive_pipelining. ZeRO-3 inferences of different mini-batch on each of the GPUs leading to higher throughput whereas device_map infers the same batch while jumping across GPUs leading to lesser throughput.

1 Like

Yes, I think I have read that sentence in the device_map document, but when it comes to single GPU and fixed batch size than don’t we conclude that two are the same?