Recommended Approach for Distributed Inference

You could have multiple ORTModelForXXX classes and each one could use a different device and then iterate over your dataset either sync or async with a queue