[discuss] approaches for reading order detection

MLLife · September 4, 2024, 8:57am

what are the different models and dataset using ML techniques to find the reading order of layout sections in pdfs? any open license model will be a plus

kahun · September 4, 2024, 10:56am

To find the reading order of layout sections in PDFs using ML techniques, you can explore models and datasets like:

LayoutLM: Uses transformer models to understand the layout and reading order in documents.
DocFormer: Focuses on document understanding and layout analysis.
OCR and Layout Analysis Datasets: Open datasets like PubLayNet and DocVQA can be used for training models. locksmith services

These models and datasets can help identify and process reading orders in PDFs.

juhoinkinen · September 14, 2024, 2:29pm

Surya is a document OCR toolkit that does:

OCR in 90+ languages that benchmarks favorably vs cloud services

Line-level text detection in any language

Layout analysis (table, image, header, etc detection)

Reading order detection

There seems to be a HFH Space running it: Surya OCR - a Hugging Face Space by artificialguybr

MLLife · September 19, 2024, 11:53am

Thanks for the response. This seems to have some restricted license. Anything with open use policy? Found one but still integration is not straight forward; HURIDOCS/pdf-reading-order · technical details about the features used? 2 models?

Topic		Replies	Views
Looking for OCR post-processing for Visual Document Understanding Research	0	636	December 15, 2023
Is there a open source implementation of "Deep Learning Based Page Layout Analyze"? Research	5	1804	May 21, 2024
Google Document AI Alternative 🤗Transformers	3	844	October 6, 2024
Training a model for a PDF with OCR - where to begin? Beginners	4	10579	October 27, 2024
LlamaIndex for PDF parsing Models	2	2382	August 27, 2024

[discuss] approaches for reading order detection

Related topics