Which is the correct bbox ocr level for LiLT? block level or word level?

anon29551288 · June 22, 2023, 11:09am

This is regarding the lilt model below

Word or segment position embeddings?

opened 06:39PM - 21 Nov 22 UTC

closed 08:40AM - 24 Nov 22 UTC

Hi @jpWang, I had a question related to LiLT; namely whether or not you're le…veraging bounding boxes per word or per segment when fine-tuning on FUNSD. The LayoutLMv3 authors saw a great boost in performance when employing the same bounding box coordinates for a set of words that make up a "segment", like an address on an invoice. They use the OCR engine to identify segments in a document, and then give the same bounding box coordinates to all the words that make up that segment (an idea which was introduced in [StructuralLM](https://arxiv.org/abs/2105.11210)). LayoutLMv1 and v2 both use "word position embeddings," which means that each individual word has its own bounding box coordinates. Does LiLT achieve 88% F1 on FUNSD with word position embeddings? Looking at [this file](https://github.com/jpWang/LiLT/blob/main/LiLTfinetune/data/datasets/funsd.py), it seems word position embeddings are used.

In the above link, author of LILT has mentioned that the model is pretrained on “segment-level box”.

Question

which kind of ocr is assumed by LiltModel ? word token level or “segment-level box”?
How to ensure the same “segment-level box” or word level ocr is applied for finetuning and inference?
Any pointers on implement the correct ocr level using pytesseract?

Topic		Replies	Views
Looking for OCR post-processing for Visual Document Understanding Research	0	638	December 15, 2023
Improving Key-Value Pair Extraction with LayoutLM and LiLT on Custom OCR Dataset Research	2	261	February 21, 2025
Lilt - Token Shift/Misalignment during model inference Models	0	238	November 9, 2023
Dataset preparation for LayoutLM and LiLT Research	1	60	April 27, 2025
LayoutLMV3 for Token Classification 🤗Transformers	7	4358	June 19, 2025

Which is the correct bbox ocr level for LiLT? block level or word level?

Related topics