My QUESTION is how run a very big model like bloom on a cluster of machines?

Hello i can run opt 66b on one server with 6 gpu 24 Gb by using your page on huggingface on how load big models : I give device_map. I can also run bloom on one server with 8 GPUs 24 GB by giving device_map but it uses offload on CPU and it takes time to answer. My QUESTION is how run a very big model like bloom on a cluster of machines indeed bloom would need 20 GPus 24 Gb and it needs a cluster of 3 machines with 8 gpus to deploy, with accelerate it is not possible as we are limited to only one machine. with Dp and ddp it is not possible as the model span on more than one machine I have tried everything, deep speed inference, RPC Framework, etc … Thanks for your help. Regards Pat