Save double load in BLIP 2?

Hello!
I am using the standard way for doing visual question answering for a given image. But I want to ask a lot of questions for each image. Sending all questions together does not work. At least this is my experience. So I send the questions one by one. The problem is that from

inputs3 = processor3(raw_image, questionX, return_tensors="pt").to(device, torch.float16)
out3 = model3.generate(**inputs3, max_length=64, min_length=12)

the second line (out3[…]) takes a lot of time. Okay, not minutes, this are just may be two seconds but I ask those 10 questions in a loop for thousands of images. So I would like, if possible, to find a smarter way. My question is: Is it possible in any way to only replace the question by keeping the processed image informations? The image is always the same (in the 10 questions part per image) and surely proccessing the image takes most of the time and not the question. As I do not know the technics behind it, it’s possible of course, that image and question can only be processed together, and now one by one. Hopefully this is not the case. :wink:

I am using the checkpoints

processor3 = AutoProcessor.from_pretrained("Salesforce/blip2-flan-t5-xxl", load_in_8bit=True, device_map="auto")
model3 = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-flan-t5-xxl", load_in_8bit=True, device_map="auto")

on RTX 4090

Best regards
Marc