Inference on Multi-GPU/multinode

gfatigati · December 21, 2022, 10:59am

Dear Huggingface community,

I’m using Owl-Vit in order to analyze a lot of input images, passing a set of labels. At the moment, my code works well but run just on 1 GPU:

model = OwlViTForObjectDetection.from_pretrained("google/owlvit-large-patch14")
processor = OwlViTProcessor.from_pretrained("google/owlvit-large-patch14")
....
inputs = processor(text=text_queries, images=image_rgb, return_tensors="pt").to(device)

# Set model in evaluation mode
model = model.to(device)
model.eval()

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
....

At the moment, it takes 4 hours to process 31.000 input images. Could you suggest how to change the above code in order to run on more Gpus? The multigpu guide section on Huggingface is under construction. I’m using a supercomputing machine, having 4 GPUs per node. I would like to run also on multi node if possible.
Thanks in advance.

IdoAmit198 · December 21, 2022, 8:08pm

You can try to utilize accelerate.
In this link you can see how to modify a code similar to yours in order to integrate the accelerate library, which can take care of the distributed setup for you.
I didn’t work with it directly for long so I might forget the specific details. Like whether you need to pass it your nodes/GPUs or not, if you do how to, but I’m sure you can easily find all those details

By the way, I just came across this recent post, which might also come handy to your needs

gfatigati · December 22, 2022, 7:01am

Thanks,

from what I understand, accelerate is used to distribute training. Does it works also with inference?

IdoAmit198 · December 22, 2022, 7:35am

As far as I know, it does.
It should work all the same, but without the need to initialize an optimizer, scheduler, etc, using the accelerator, and only init the device, eval_dataloader, model with the accelerator.

In case it won’t work for for some reason there are more other wrappers to run distributed inference with (which also give a speed up), such as Optimum (made to accelerate inference).

In addition it’s worth to mention you can always do it the “hard” way and implement stuff with torch.nn.DataParallel or with torch.nn.parallel.DistributedDataParallel.
Then you can run your code via the torchrun console script. But again, I personally find this method harder than a wrapper like accelerate or Optimum.

gfatigati · January 12, 2023, 9:50am

Hi,

thanks for your reply. I’m reading Optimum User Guide, in particular: Quick tour Accelerated Inference section.

In such section, is uses a tokenizer and and ort_model. My actual code is quite different and I don’t understand how to integrate Optimum in Owl-Vit:

model = OwlViTForObjectDetection.from_pretrained("google/owlvit-large-patch14")

processor = OwlViTProcessor.from_pretrained("google/owlvit-large-patch14")

Any help or some examples reference? I didn’t find anything. Thanks

Topic		Replies	Views
Multiple gpu not properly parallelized during model.generate() 🤗Transformers	4	1623	October 9, 2022
Which model for inference on 11 GB GPU? Beginners	1	394	October 30, 2021
Owl-vit batch images inference Beginners	2	1120	May 7, 2024
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9609	October 16, 2024
Inference on multi GPUs Research	2	227	May 1, 2025

Inference on Multi-GPU/multinode

Related topics