LayoutLMV3 for Token Classification

WaterKnight · August 7, 2022, 1:24pm

Thanks in advance for implementing this model in the HuggingFace library

I annotated several Images using Label Studio ML Backend Tesseract: label-studio-ml-backend/label_studio_ml/examples/tesseract at master · heartexlabs/label-studio-ml-backend · GitHub

ls_demo_ocr

With this tool you draw the box with the selected label and it extracts the text for you. You can see this in the above gif.

After that I exported the annotations and created a dataset using the bbox format expected by the model, I saw this here

Finally, I trained the model for Token Classification.

However, the model is not working well at inference time. At inference time I set the processor to apply OCR:

processor = AutoProcessor.from_pretrained("microsoft/layoutlmv3-base", apply_ocr=True)

And I just pass an image:

encoding = processor(image, truncation=True, return_tensors="pt")

The model doesn´t classify the tokens well. However, If i pass the bboxes and text from my annotations it works properly.

How is this model supposed to be used for inference? Do you need to pass the hand-drawn bboxes and text?

I want to use this model to extract information automatically and if I have to pass these annotations manually it makes no sense.

Maybe I did something wrong at labelling? Should I run the image through tesseract and then label all the bboxes it returns instead of drawing them by hand?

WaterKnight · August 7, 2022, 7:16pm

I passed all images throught easy OCR and annotated all the boxes with label studio following next tutorial: Label Studio Blog — Improve OCR quality for receipt processing with Tesseract and Label Studio

Then I trained the model and at inference time I use boxes and text frome asyocr.

nielsr · August 8, 2022, 10:29am

Hi,

Thanks for your interest in LayoutLMv3. That labelling tool likes nice!

I’d say that you need to make sure that the OCR settings between training and inference should be identical, otherwise the model will not work as expected at inference time.

e.g. are you making sure bounding boxes are provided in the appropriate format during training? I’d check things such as:

Each bounding box should be a normalized version in (x0, y0, x1, y1) format, where (x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1) represents the position of the lower right corner.

So make sure to provide the same settings for the bounding boxes during training vs inference, make sure you provide them in the right order (from top left in the document to bottom right), etc.

WaterKnight · August 8, 2022, 2:15pm

Yes, the problem was that when I annotated the images I drew the boxes and put the text.

What I did was send all the images through EasyOCR and then I annotated the bounding boxes with Label Studio. The guide I followed was: Label Studio Blog — Improve OCR quality for receipt processing with Tesseract and Label Studio, the only thing that I did was changing Tesseract for EasyOCR.

With these annotations my model worked fine at inference time I just needed to send the image first to EasyOCR.

purnasai · September 29, 2022, 7:21am

Hi @WaterKnight, For token classification other than the labels we need to classify, dont we need one more label named “others”.

So when you used labelstudio, did you select the text that belongs only to those labels we need to classify or you also selected the text belongs to “Others” label as well.

Just want to know. Thanks

jp1773hsu · January 24, 2023, 3:29pm

I have the same question as @purnasai . When I run train and test on custom dataset, the model performs nicely on test; however, when I follow inference guide using an image very similar to a test image, the model performs poorly. The image output on inference has many “other” bounding boxes and the two classes of labels found, are wrong. I did not annotate “other” labels during labeling and annotating; only used two classes of labels, which are inferred nicely on test set.

When labeling training images, do I need to identify the words I’m not interested in as “other” along with the other two classes that I am interested in?

purnasai · December 10, 2023, 6:01am

hi @jp1773hsu did you find the solution…!

artpods56 · June 19, 2025, 1:41pm

The OCR engine you are using for inference should be the same you used for training. By default the Transformers AutoProcessor uses PyTesseract under the hood when you set apply_ocr attribute to True. So the solution is to use the same ocr engine during inference and to manualy pass words and bboxes to the processor.
Please correct me if I’m wrong but I feel like this is the case.

Topic		Replies	Views
Image Token classification LayoutLMv3 Beginners	0	354	November 7, 2023
LayoutLM data format for bounding box classification Intermediate	1	264	February 13, 2025
LayoutLMv3 missing visual tokenizer? Beginners	7	481	January 4, 2023
Models for Document Image Annotation Without OCR Research	1	181	December 12, 2024
Optimal Approach for Fine-Tuning LayoutLMv3 for Token Classification with 80 Labels Models	3	31	May 26, 2025

LayoutLMV3 for Token Classification

Related topics