If I use the deepspeed integration with multi-GPU, am I doing model parallel or just data parallel?
I know the deepspeed library may be able of model parallel but I am still learning it. Is there a way to do model parallel in transformers?
And is there a doc or example for the parallelize method in T5(or any other model)?
Thank you for your time!