This is regarding the lilt model below
In the above link, author of LILT has mentioned that the model is pretrained on “segment-level box”.
- which kind of ocr is assumed by LiltModel ? word token level or “segment-level box”?
- How to ensure the same “segment-level box” or word level ocr is applied for finetuning and inference?
- Any pointers on implement the correct ocr level using pytesseract?