LayoutLM model annotation regarding

I am trying to fine tune Layoutlm model with my custom invoice dataset. In the annotation phase, how should I label ? In Funsd dataset, the images are labelled in the Q-A format. For invoice how to do it ? for example, if i have below present in invoice

ABC Technologies LLC
invoice number : 1234
invoice date : 21/04/2023


Maybe that data will tagging like this.

First, data labeling with tag Q-A NER format.
in this case, funsd label is

{'header', 'question', 'answer', 'other'}

Then, that sentence will tagging like this.

{"ABC Technologies LLC":header}
{"invoice number : ":question}{"1234":answer}
{"invoice date :" :question} {"21/04/2023":answer}

If you will use that NER tagged data to LayoutLM model. make that tagging data to fit model input format.

In LayoutLM finetuning funsd, that use BIO-tag to split word to token.
Let me show example, not formal tokenize rule just trial to show example.

{"ABC Technologies LLC":header} 
-> {"ABC":B-header, "Technologies":I-header ,  "LLC":I-header } 

{"invoice number : ":question}
->{"invoice":B-question,"number : ":I-question}

After tagging data, fine-tuning LayoutLM model will work fine.


Thank you for the clarification
I will try this and let you know