Inference on Multi-GPU/multinode

Dear Huggingface community,

I’m using Owl-Vit in order to analyze a lot of input images, passing a set of labels. At the moment, my code works well but run just on 1 GPU:

model = OwlViTForObjectDetection.from_pretrained("google/owlvit-large-patch14")
processor = OwlViTProcessor.from_pretrained("google/owlvit-large-patch14")
inputs = processor(text=text_queries, images=image_rgb, return_tensors="pt").to(device)

# Set model in evaluation mode
model =

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)

At the moment, it takes 4 hours to process 31.000 input images. Could you suggest how to change the above code in order to run on more Gpus? The multigpu guide section on Huggingface is under construction. I’m using a supercomputing machine, having 4 GPUs per node. I would like to run also on multi node if possible.
Thanks in advance.

1 Like

You can try to utilize accelerate.
In this link you can see how to modify a code similar to yours in order to integrate the accelerate library, which can take care of the distributed setup for you.
I didn’t work with it directly for long so I might forget the specific details. Like whether you need to pass it your nodes/GPUs or not, if you do how to, but I’m sure you can easily find all those details :slight_smile:

By the way, I just came across this recent post, which might also come handy to your needs


from what I understand, accelerate is used to distribute training. Does it works also with inference?

As far as I know, it does.
It should work all the same, but without the need to initialize an optimizer, scheduler, etc, using the accelerator, and only init the device, eval_dataloader, model with the accelerator.

In case it won’t work for for some reason there are more other wrappers to run distributed inference with (which also give a speed up), such as Optimum (made to accelerate inference).

In addition it’s worth to mention you can always do it the “hard” way and implement stuff with torch.nn.DataParallel or with torch.nn.parallel.DistributedDataParallel.
Then you can run your code via the torchrun console script. But again, I personally find this method harder than a wrapper like accelerate or Optimum.

1 Like


thanks for your reply. I’m reading Optimum User Guide, in particular: Quick tour Accelerated Inference section.

In such section, is uses a tokenizer and and ort_model. My actual code is quite different and I don’t understand how to integrate Optimum in Owl-Vit:

model = OwlViTForObjectDetection.from_pretrained("google/owlvit-large-patch14")

processor = OwlViTProcessor.from_pretrained("google/owlvit-large-patch14")

Any help or some examples reference? I didn’t find anything. Thanks