Use batching for ViLT predictions

Hi,
I try to extract pooled embeddings from ViLT model for 1000 objects (PIL Image + text description) in Google Colab.
Code sample (based on this example):

device = "cuda:0" if torch.cuda.is_available() else "cpu"

processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-mlm")
model = ViltModel.from_pretrained("dandelin/vilt-b32-mlm")

inputs = processor(['List with 1000 (256x256x3) PIL images'],
                              ['List with 1000 image descriptions'], 
                              return_tensors="pt", 
                              padding=True).to(device)
model = model.to(device)                   
outputs = model(**inputs)
result = outputs.pooler_output

It works fine while the object amount is less than 50, but beyond this point, I got out of RAM error. It seems that this memory allocation problem could be solved by batching, but how can I apply it to this code?

Problem was solved by:

  1. Using GPU instead of CPU
  2. For loop implementing
  3. output conversion to numpy matrix
result = []
for step in tqdm(range(0, 1000, 10)):
    inputs = processor(images[step: step + 10], 
                       texts[step: step + 10], 
                       return_tensors="pt")
    inputs = inputs.to(device)

    outputs = model(**inputs)
    pooler_output = outputs.pooler_output.cpu().detach().numpy()
    result.append(pooler_output)
    gc.collect()