LayoutLM model annotation regarding

Jayan · April 21, 2023, 9:59am

Hi
I am trying to fine tune Layoutlm model with my custom invoice dataset. In the annotation phase, how should I label ? In Funsd dataset, the images are labelled in the Q-A format. For invoice how to do it ? for example, if i have below present in invoice

ABC Technologies LLC
invoice number : 1234
invoice date : 21/04/2023

cog · May 3, 2023, 2:15am

hi.

Maybe that data will tagging like this.

First, data labeling with tag Q-A NER format.
in this case, funsd label is

{'header', 'question', 'answer', 'other'}

Then, that sentence will tagging like this.

{"ABC Technologies LLC":header}
{"invoice number : ":question}{"1234":answer}
{"invoice date :" :question} {"21/04/2023":answer}

If you will use that NER tagged data to LayoutLM model. make that tagging data to fit model input format.

In LayoutLM finetuning funsd, that use BIO-tag to split word to token.
Let me show example, not formal tokenize rule just trial to show example.

{"ABC Technologies LLC":header} 
-> {"ABC":B-header, "Technologies":I-header ,  "LLC":I-header } 

{"invoice number : ":question}
->{"invoice":B-question,"number : ":I-question}
...

After tagging data, fine-tuning LayoutLM model will work fine.

Jayan · May 3, 2023, 2:36am

Hi

Thank you for the clarification
I will try this and let you know

Topic		Replies	Views
Use layoutLM to extract data from inviices Beginners	6	107	March 31, 2025
Why is LayoutLMv2 Bad at Token Classification? Beginners	0	404	June 17, 2023
Streamlining Invoice Classification with LayoutMLv3 and Label-Studio: Simplifying Data Labeling for Precise Results Beginners	0	369	April 11, 2024
LayoutLMV3 information extraction from invoice Awesome paper	2	993	September 22, 2024
NER model fine tuning with labeled spans Beginners	5	3910	May 7, 2023

LayoutLM model annotation regarding

Related topics