Index out of range layoutlm

I am trying to fine tune LayoutLm for SROIE receipt named entity extraction. I checked the github page of Layoutlm and used their run_seq_labelling.py and preprocess.py on this new dataset i prepared but i am receiving following error:

Iteration:   4%|█████▉                                                                                                                                                             | 21/577 [00:53<23:45,  2.56s/it]
Epoch:   0%|                                                                                                                                                                                | 0/100 [00:53<?, ?it/s]
Traceback (most recent call last):
  File "run_seq_labeling.py", line 812, in <module>
    main()
  File "run_seq_labeling.py", line 705, in main
    args, train_dataset, model, tokenizer, labels, pad_token_label_id
  File "run_seq_labeling.py", line 220, in train
    outputs = model(**inputs)
  File "/home/ml3/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ml3/.local/lib/python3.6/site-packages/layoutlm/modeling/layoutlm.py", line 221, in forward
    head_mask=head_mask,
  File "/home/ml3/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ml3/.local/lib/python3.6/site-packages/layoutlm/modeling/layoutlm.py", line 171, in forward
    input_ids, bbox, position_ids=position_ids, token_type_ids=token_type_ids
  File "/home/ml3/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ml3/.local/lib/python3.6/site-packages/layoutlm/modeling/layoutlm.py", line 82, in forward
    bbox[:, :, 2] - bbox[:, :, 0]
  File "/home/ml3/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ml3/.local/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 126, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/ml3/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 1814, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

I am using transformers 2.9 as the github page states as a requirement

I found the issue. Turns out that OCR detects vertical text and in that case width comes up as negative

Hi,

I am facing the same index out of range problem. Can you elaborate a little more on your solution.

Thank you.

@keshav5196 Check your bounding boxes. In my case, I found out that some of the handwritten text was vertical. Layoutlm doesn’t like that. Just remove boxes like that

First to be clear by vertical you mean 90 degrees rotated right? If a word is vertical how the bounding box would be affected?

By the way I checked my data and I didn’t found any vertical words in a image.

Solved my problem. It was due to some negative width or height.

For example if input box was like (x0, y0, x1, y1). Here if y1-y0 or x1-x0 was negative then LayoutLM will throw error.

On GPU error will be, CUDA error: device-side assert triggered.
On CPU error wiil be, Index error: index out of range in self

1 Like