Recommended Approach for Distributed Inference

I am looking to run inference with optimum in a distributed setting with PyTorch (multi-node, multi-CPU/GPU). Is there a recommended approach to do this? My data is coming from a HF Datasets object.

I tried using this solution with the HF Trainer, but it gives me an error when I run it with an optimum model (the optimum model does not have an eval() function).

You could have multiple ORTModelForXXX classes and each one could use a different device and then iterate over your dataset either sync or async with a queue

So in this solution would I be using something like torch.distributed to manage this process and aggregate the prediction results? Are there any code examples you could point me to?

Or just python threads. There is no need for torch.distributed. No we don’t have any code example for that.