I try to extract pooled embeddings from ViLT model for 1000 objects (PIL Image + text description) in Google Colab.
Code sample (based on this example):
device = "cuda:0" if torch.cuda.is_available() else "cpu"
processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-mlm")
model = ViltModel.from_pretrained("dandelin/vilt-b32-mlm")
inputs = processor(['List with 1000 (256x256x3) PIL images'],
['List with 1000 image descriptions'],
model = model.to(device)
outputs = model(**inputs)
result = outputs.pooler_output
It works fine while the object amount is less than 50, but beyond this point, I got out of RAM error. It seems that this memory allocation problem could be solved by batching, but how can I apply it to this code?