I am currently using the finetuned LayoutLMv3 on the FUNSD dataset.

When I was using the model for new images, I noticed a problem using the processor.

encoding = processor(resized_image, words, boxes=boxes, return_offsets_mapping=True, return_tensors=“pt”)

The elements of boxes can only be less than 1000. so I have resized the boxed and also the image to have a max dim of 1000.

Is that the correct way of doing things?

Then, I have encountered an error during

with torch.no_grad():

outputs = model(**encoding)

“IndexError: index out of range in self”

Can anybody explain to me why this error?

encoding = processor(resized_image, words, boxes=boxes, return_offsets_mapping=True, return_tensors=“pt”)

And lastlyb are all the images resized to (224,224)?

Here are the encodings.

for k,v in encoding.items():

print(k,v.shape)

Outputs:

input_ids torch.Size([1, 795])

attention_mask torch.Size([1, 795])

offset_mapping torch.Size([1, 795, 2])

bbox torch.Size([1, 795, 4])

pixel_values torch.Size([1, 3, 224, 224])