BLIP-2 on Optimum

Hi, I am trying to use BLIP-2 but as it is very large, I want to use it with multiple GPUs so that I can load it on RAM. I observed that it was supported according to the Optimum website. I just wanted to know if I should put a feature request or is there some way to load BLIP-2 using optimum on multiple GPUs?

Thanks

Warm Regards,
Vedaant Jain

@VedaantJain Has the model been exported to ONNX or is it a PyTorch model?

@VedaantJain Hi, ONNX Runtime does not support multi-GPU inference natively. If you are using the transformers implementation of BLIP-2 with accelerate to dispatch on several GPUs (naive pipeline parallelism), you should be able to use the BetterTransformer implementation with:

model = model.to_bettertransformer()

smoothly. Feel free to open an issue on github if you have an issue.

oh okay, I think I will use the transformers implementation then. Thank you

It’s also worth using int4/int8 quantization using bitsandbytes to reduce the memory. This can be done by passing load_in_8bit=True or load_in_4bit=True to the from_pretrained method. See a full code example here: Salesforce/blip2-opt-2.7b · Hugging Face.