I want to implement pipeline parallelism of Huge LMs(saved in huggingface hub) that do not fit into single gpu. I tried to use Deepspeed, however LM from transformers can not be piped as it is not sequential.
So is there any API that convert transformers to sequential?
Or is there any better way to make Huge LMs to fit into multi gpu?
Thanks in advance.