ZeRO 2 and 3 with Tensor Parallelism

conceptofmind · July 3, 2022, 7:02pm

Hi,

In DeepSpeed, ZeRO 2 was disabled with Pipeline Parallelism due to computational inefficiencies since ZeRO 2 splits gradients and PP accumulates them. I believe ZeRO 3 and Tensor Parallelism are complimentary but I am unsure if the same is for ZeRO 2 as well.

I was wondering if anyone noted whether similar inefficiencies or any issues occur when using ZeRO 2 or 3 with Tensor Parallelism in accelerate?

Before I refactor a model to use tensor parallelism, I wanted to ensure that it would still be completely compatible with ZeRO 2 or 3.

Thank you,

Enrico

Topic		Replies	Views
Is CPU-offloading function in accelerate same with deepSpeed? 🤗Accelerate	4	2760	July 1, 2023
Deepspeed ZeRO Inference DeepSpeed	1	2731	November 24, 2021
ZeRO uses more RAM than DDP? DeepSpeed	0	1027	August 7, 2023
Questions about deepspeed multi-node training with sharding parameters inside a single 8-gpu machine DeepSpeed	0	844	October 21, 2022
Manual pipeline parallelization with DeepSpeed DeepSpeed	0	758	January 7, 2023

ZeRO 2 and 3 with Tensor Parallelism

Related topics