Hey, we have this sample using Instruct-pix2pix diffuser . We observe that inference is faster on a multi-GPU instance than on a single-GPU instance ; is the pipe.to("cuda:" + gpu_id)
running the pipeline on multiple GPUs? what explains the speedup on a multi-GPU machine vs single-GPU machine?
The sample you shared is not using the Hugging Face DLC. We cannot help you with that.
Hi @philschmid,
I authored the code above to have an example handle the multi-gpu problem stated under:
How to pass device_id to overriden functions? · Issue #66 · aws/sagemaker-huggingface-inference-toolkit · GitHub.
Probably the huggingface-toolkit needs to be fixed (see How to pass device_id to overriden functions? · Issue #66 · aws/sagemaker-huggingface-inference-toolkit · GitHub) to work with multiple gpus in the same fashion.