I am working on an OCR project to perform document understanding on a specific type of document with a fixed layout (e.g., mandate forms). The layout has minor variations across images. My goal is to accurately extract key-value pairs and output them in JSON format.
Now for dataset preparation, i have samples of the document on which i want to perform OCR and extract text accurately. Now, when i am annotating the dataset, do I have to annotate them as keys and value generally like all keys as “key” and all values as “values” or each key and values specifically Date_key, Date_value, AccountNo_key, Account_Value, etc?
Which will be the best practice to annotate my dataset to train it on LayoutLM and LiLT model.?