Hello folks!
I am looking to finetune a pipeline and I have a stack of models (say A, B and C) in the pipeline, one of which (B) is an LLM (GPT-J) that occupies a lot of memory.
I want to split model B on 2 GPUs and add the other models in the pipeline on these GPUs in order (I.e. A and half of B on GPU_0 ; other half of B on GPU_1 and C on GPU_1).
Does DeepSpeed allow manual model parallelization, i.e. can I decide which model goes on what GPU? If not, what would be the best way to achieve pipeline parallelization with a split model?
Thanks!