Save double load in BLIP 2?

Marcophono · March 13, 2023, 6:13pm

Hello!
I am using the standard way for doing visual question answering for a given image. But I want to ask a lot of questions for each image. Sending all questions together does not work. At least this is my experience. So I send the questions one by one. The problem is that from

inputs3 = processor3(raw_image, questionX, return_tensors="pt").to(device, torch.float16)
out3 = model3.generate(**inputs3, max_length=64, min_length=12)

the second line (out3[…]) takes a lot of time. Okay, not minutes, this are just may be two seconds but I ask those 10 questions in a loop for thousands of images. So I would like, if possible, to find a smarter way. My question is: Is it possible in any way to only replace the question by keeping the processed image informations? The image is always the same (in the 10 questions part per image) and surely proccessing the image takes most of the time and not the question. As I do not know the technics behind it, it’s possible of course, that image and question can only be processed together, and now one by one. Hopefully this is not the case.

I am using the checkpoints

processor3 = AutoProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl", load_in_8bit=True, device_map="auto")
model3 = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", load_in_8bit=True, device_map="auto")

on RTX 4090

Best regards
Marc

Topic		Replies	Views
How can i provide several questions DocVQA? 🤗Transformers	1	185	March 31, 2023
Blip model gives no response Beginners	1	138	August 21, 2024
Run parallel api inference for QA 🤗Transformers	0	314	November 19, 2021
Multimodal LLM with Image and Text sequentially in its prompt 🤗Transformers	2	12533	January 1, 2024
Blip2 with a new LLM Intermediate	0	807	August 15, 2023

Save double load in BLIP 2?

Related topics