BLIP-2 on Optimum

VedaantJain · July 19, 2023, 6:35pm

Hi, I am trying to use BLIP-2 but as it is very large, I want to use it with multiple GPUs so that I can load it on RAM. I observed that it was supported according to the Optimum website. I just wanted to know if I should put a feature request or is there some way to load BLIP-2 using optimum on multiple GPUs?

Thanks

Warm Regards,
Vedaant Jain

regisss · July 20, 2023, 10:58am

@VedaantJain Has the model been exported to ONNX or is it a PyTorch model?

fxmarty · July 21, 2023, 11:39am

@VedaantJain Hi, ONNX Runtime does not support multi-GPU inference natively. If you are using the transformers implementation of BLIP-2 with accelerate to dispatch on several GPUs (naive pipeline parallelism), you should be able to use the BetterTransformer implementation with:

model = model.to_bettertransformer()

smoothly. Feel free to open an issue on github if you have an issue.

VedaantJain · July 21, 2023, 1:37pm

oh okay, I think I will use the transformers implementation then. Thank you

nielsr · July 21, 2023, 2:20pm

It’s also worth using int4/int8 quantization using bitsandbytes to reduce the memory. This can be done by passing load_in_8bit=True or load_in_4bit=True to the from_pretrained method. See a full code example here: Salesforce/blip2-opt-2.7b · Hugging Face.

Topic		Replies	Views
Questions about ONNX 🤗Transformers	4	3098	January 25, 2022
HOw to make optimum make use of all available GPUs? 🤗Optimum	7	3468	June 1, 2023
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9583	October 16, 2024
ONNX Flan-T5 Model OOM on GPU 🤗Optimum	2	2630	June 15, 2023
How to use specified GPUs with Accelerator to train the model? Beginners	15	29298	August 23, 2024

BLIP-2 on Optimum

Related topics