Jayan
April 21, 2023, 9:59am
1
Hi
I am trying to fine tune Layoutlm model with my custom invoice dataset. In the annotation phase, how should I label ? In Funsd dataset, the images are labelled in the Q-A format. For invoice how to do it ? for example, if i have below present in invoice
ABC Technologies LLC
invoice number : 1234
invoice date : 21/04/2023
cog
May 3, 2023, 2:15am
2
hi.
Maybe that data will tagging like this.
First, data labeling with tag Q-A NER format.
in this case, funsd label is
{'header', 'question', 'answer', 'other'}
Then, that sentence will tagging like this.
{"ABC Technologies LLC":header}
{"invoice number : ":question}{"1234":answer}
{"invoice date :" :question} {"21/04/2023":answer}
If you will use that NER tagged data to LayoutLM model. make that tagging data to fit model input format.
In LayoutLM finetuning funsd, that use BIO-tag to split word to token.
Let me show example, not formal tokenize rule just trial to show example.
{"ABC Technologies LLC":header}
-> {"ABC":B-header, "Technologies":I-header , "LLC":I-header }
{"invoice number : ":question}
->{"invoice":B-question,"number : ":I-question}
...
After tagging data, fine-tuning LayoutLM model will work fine.
Jayan
May 3, 2023, 2:36am
3
Hi
Thank you for the clarification
I will try this and let you know