I am currently using the finetuned LayoutLMv3 on the FUNSD dataset.
When I was using the model for new images, I noticed a problem using the processor.
encoding = processor(resized_image, words, boxes=boxes, return_offsets_mapping=True, return_tensors=“pt”)
The elements of boxes can only be less than 1000. so I have resized the boxed and also the image to have a max dim of 1000.
Is that the correct way of doing things?
Then, I have encountered an error during
with torch.no_grad():
outputs = model(**encoding)
“IndexError: index out of range in self”
Can anybody explain to me why this error?
encoding = processor(resized_image, words, boxes=boxes, return_offsets_mapping=True, return_tensors=“pt”)
And lastlyb are all the images resized to (224,224)?
Here are the encodings.
for k,v in encoding.items():
print(k,v.shape)
Outputs:
input_ids torch.Size([1, 795])
attention_mask torch.Size([1, 795])
offset_mapping torch.Size([1, 795, 2])
bbox torch.Size([1, 795, 4])
pixel_values torch.Size([1, 3, 224, 224])