Hi, I am trying to use BLIP-2 but as it is very large, I want to use it with multiple GPUs so that I can load it on RAM. I observed that it was supported according to the Optimum website. I just wanted to know if I should put a feature request or is there some way to load BLIP-2 using optimum on multiple GPUs?
@VedaantJain Hi, ONNX Runtime does not support multi-GPU inference natively. If you are using the transformers implementation of BLIP-2 with accelerate to dispatch on several GPUs (naive pipeline parallelism), you should be able to use the BetterTransformer implementation with:
model = model.to_bettertransformer()
smoothly. Feel free to open an issue on github if you have an issue.
It’s also worth using int4/int8 quantization using bitsandbytes to reduce the memory. This can be done by passing load_in_8bit=True or load_in_4bit=True to the from_pretrained method. See a full code example here: Salesforce/blip2-opt-2.7b · Hugging Face.