How to deploy larger model inference on multiple machine with multiple GPU?

You need a LLM engineer for this.

You wont be able to load 70b especially if all 4 machines are separate…
I would try some libraries such as GitHub - InternLM/lmdeploy: LMDeploy is a toolkit for compressing, deploying, and serving LLMs. which is the fastest one I saw, or GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. which may be easier to deploy as for separate machines at once. You spin up a worker on every machine.
or with torchserve make sure your quantize the models

I would suggest llama-13b or llama-7b quantized to 8-bit. Keep in mind that you need about 2gb of VRAM for every parallel request