what are the different models and dataset using ML techniques to find the reading order of layout sections in pdfs? any open license model will be a plus
To find the reading order of layout sections in PDFs using ML techniques, you can explore models and datasets like:
- LayoutLM: Uses transformer models to understand the layout and reading order in documents.
- DocFormer: Focuses on document understanding and layout analysis.
- OCR and Layout Analysis Datasets: Open datasets like PubLayNet and DocVQA can be used for training models. locksmith services
These models and datasets can help identify and process reading orders in PDFs.
1 Like
Surya is a document OCR toolkit that does:
- OCR in 90+ languages that benchmarks favorably vs cloud services
- Line-level text detection in any language
- Layout analysis (table, image, header, etc detection)
- Reading order detection
There seems to be a HFH Space running it: Surya OCR - a Hugging Face Space by artificialguybr
1 Like
Thanks for the response. This seems to have some restricted license. Anything with open use policy? Found one but still integration is not straight forward; HURIDOCS/pdf-reading-order · technical details about the features used? 2 models?