LayoutLMv3 processor error

AmitALU · September 27, 2024, 8:21am

I am currently using the finetuned LayoutLMv3 on the FUNSD dataset.

When I was using the model for new images, I noticed a problem using the processor.

encoding = processor(resized_image, words, boxes=boxes, return_offsets_mapping=True, return_tensors=“pt”)

The elements of boxes can only be less than 1000. so I have resized the boxed and also the image to have a max dim of 1000.

Is that the correct way of doing things?

Then, I have encountered an error during

with torch.no_grad():
outputs = model(**encoding)

“IndexError: index out of range in self”

Can anybody explain to me why this error?
encoding = processor(resized_image, words, boxes=boxes, return_offsets_mapping=True, return_tensors=“pt”)

And lastlyb are all the images resized to (224,224)?
Here are the encodings.
for k,v in encoding.items():
print(k,v.shape)

Outputs:
input_ids torch.Size([1, 795])
attention_mask torch.Size([1, 795])
offset_mapping torch.Size([1, 795, 2])
bbox torch.Size([1, 795, 4])
pixel_values torch.Size([1, 3, 224, 224])

John6666 · September 27, 2024, 8:48am

Apparently there is a precedent. It seems that the dataset and the model are incompatible. You will probably need to normalize the dataset manually.

github.com/microsoft/unilm

LayoutLM v2 on FUNSD-like dataset - index out of range in self

opened 04:56PM - 24 Jun 22 UTC

closed 04:50PM - 25 Jun 22 UTC

naourass

I'm using `transformers` to finetune ` microsoft/layoutlmv2-base-uncased` on my …custom dataset that is similar to FUNSD. After a few iterations of training I get this error : ``` Traceback (most recent call last): File "layoutlmV2/train.py", line 137, in <module> trainer.train() File "..../lib/python3.8/site-packages/transformers/trainer.py", line 1409, in train return inner_training_loop( File "..../lib/python3.8/site-packages/transformers/trainer.py", line 1651, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "..../lib/python3.8/site-packages/transformers/trainer.py", line 2345, in training_step loss = self.compute_loss(model, inputs) File "..../lib/python3.8/site-packages/transformers/trainer.py", line 2377, in compute_loss outputs = model(**inputs) File "..../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl return forward_call(*input, **kwargs) File "..../lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 1228, in forward outputs = self.layoutlmv2( File "..../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl return forward_call(*input, **kwargs) File "..../lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 902, in forward text_layout_emb = self._calc_text_embeddings( File "..../lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 753, in _calc_text_embeddings spatial_position_embeddings = self.embeddings._calc_spatial_position_embeddings(bbox) File "..../lib/python3.8/site-packages/transformers/models/layoutlmv2/modeling_layoutlmv2.py", line 93, in _calc_spatial_position_embeddings h_position_embeddings = self.h_position_embeddings(bbox[:, :, 3] - bbox[:, :, 1]) File "..../lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl return forward_call(*input, **kwargs) File "..../lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "..../lib/python3.8/site-packages/torch/nn/functional.py", line 2203, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self ``` After further inspection (vocab size, bboxes, dimensions, classes...) I noticed that there's negative numbers inside the input tensor causing the error. These negative numbers are returned by _calc_spatial_position_embeddings(self, bbox) in modeling_layoutlmv2.py line 92 : `h_position_embeddings = self.h_position_embeddings(bbox[:, :, 3] - bbox[:, :, 1]) ` What does this line exactly do ? Assuming that the negative values are the cause of the issue, what could I do to prevent the input values from becoming negative ? Thanks in advance !

github.com/microsoft/unilm

LayoutLMv3: IndexError: index out of range in self on some inputs

opened 03:22PM - 09 Aug 22 UTC

closed 03:34PM - 03 Nov 22 UTC

HonzaCech

**Describe the bug** Model I am using (UniLM, MiniLM, LayoutLM ...): LayoutLM…v3 The problem arises when using: * my own modified scripts: (give details below) I'm using LayoutLMv3 from huggingface transformers - https://huggingface.co/docs/transformers/model_doc/layoutlmv3#transformers.LayoutLMv3Model - and I am getting an "IndexError: index out of range in self" on some inputs. It works fine for some inputs, but fails for others. I found a similar issue here -https://github.com/microsoft/unilm/issues/771 - but there, the problem was with bboxes. However, I am using the full processor with OCR that creates the bounding boxes for me, so my only actual input are PIL images. I initialize the processor with `processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=True)` and the call it with `processor(img1, return_tensors="np", padding='max_length')` (img1 is PIL image) The full stack trace is ``` Traceback (most recent call last): File "/home/h/projects/layoutLM/main_layoutLM.py", line 58, in <module> main() File "/home/h/projects/layoutLM/main_layoutLM.py", line 37, in main output1, output2 = net(img0, img1) File "/home/h/projects/layoutLM/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/h/projects/layoutLM/layoutlm_classifier_contrastive.py", line 36, in forward output1 = self.forward_once(input1) File "/home/h/projects/layoutLM/layoutlm_classifier_contrastive.py", line 29, in forward_once output = self.layout_LM(**x) File "/home/h/projects/layoutLM/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/h/projects/layoutLM/venv/lib/python3.9/site-packages/transformers/models/layoutlmv3/modeling_layoutlmv3.py", line 833, in forward embedding_output = self.embeddings( File "/home/h/projects/layoutLM/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/h/projects/layoutLM/venv/lib/python3.9/site-packages/transformers/models/layoutlmv3/modeling_layoutlmv3.py", line 261, in forward position_embeddings = self.position_embeddings(position_ids) File "/home/h/projects/layoutLM/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/h/projects/layoutLM/venv/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "/home/h/projects/layoutLM/venv/lib/python3.9/site-packages/torch/nn/functional.py", line 2199, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) IndexError: index out of range in self Process finished with exit code 1 ``` where `output = self.layout_LM(**x)` is the actual calling of LayoutLM, which is created by `self.layout_LM = AutoModel.from_pretrained("microsoft/layoutlmv3-base")` I have no idea what could be wrong, as I said, it works fine for some inputs and I don't see anything special about cases where it fails - Platform: Ubuntu 20.04 - Python version: 3.9 - PyTorch version (GPU?): 1.12.0+cpu

AmitALU · September 27, 2024, 9:12am

For me the normalization was not the problem.
As mentioned in the above posts one of the recurrring problems was that the bounding boxes were too small.

That was not the problem for me:

code to check if each bbox is at least (1,1)

for bbox in bounding_boxes:
assert bbox[2] - bbox[0] > 1
assert bbox[3] - bbox[1] > 1

The problem was that the embedding layer in model wass not accepting the input ids in the data sample. This generally happens when the length of data sample is more than 512. one has to set the truncate parameter to True. So that the length never more than 512. Mine was 700.

encoding = processor(original_image, words, boxes=boxes, return_offsets_mapping=True, max_length=512, padding=“max_length”, truncation=True, return_tensors=“pt”)

But still have not figured out why resize to 224,224.

Thanks John6666

John6666 · September 27, 2024, 10:21am

I’m glad you were able to resolve some of this.

But still have not figured out why resize to 224,224.

Other models such as the SigLip, for example, also resize to that size, though not exactly the same size, so perhaps the current model is designed for that level of resolution.
However, there are a lot of things I don’t understand, such as why it is necessary to reduce the resolution that much, and which one gives better results if it is stretching, padding, or cropping.
Well, I’m not mainly dealing with VLM or LLM, so I’m going to assume that as long as there are no problems with the operation, it’s fine.

system · September 30, 2024, 5:16am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[LayoutLMv3] index out of range in self inside outputs = model(**encoding) Models	4	2730	May 10, 2024
TypeError: Couldn't cast array of type int64 to Sequence Models	0	791	August 19, 2022
Fine-tune transformers for language model Beginners	2	662	August 14, 2022
Error ValueError: too many values to unpack (expected 2) in model training 🤗Transformers	1	76	November 9, 2024
Layoutlmv2 token classification on documents having tokens larger than 512 Models	8	2315	October 20, 2022

LayoutLMv3 processor error

code to check if each bbox is at least (1,1)

Related topics