Any model that takes in a clean PDF and outputs a JSON of all the fillable fields that should be added to it + coordinates?

John6666 · March 24, 2025, 4:32pm

That’s certainly not something that can be done with a non-AI library alone…

However, for images of text that use fonts that are not handwritten, libraries that are in the realm of deep learning are useful to a certain extent.

Anyway, at the moment, I don’t think it’s easy to do with just one open-source AI model…
I think we need to first divide the process and make a plan to appropriately assign it to various programs and models.
Of course, it might be possible to use VLM or multi-modal LLM alone by fine-tuning the model for the purpose or using an extremely large model regardless of cost…

However, if it is at a level that cannot be processed by the LayoutLM series, I think a combined approach is more realistic. Even in existing AI-based services, those that use AI in the core part and pre- and post-process with normal programs stand out more than those that are based solely on AI. This is especially advantageous when accuracy is required.

Topic		Replies	Views
Extract data from text and parse it as a JSON Beginners	6	22818	August 6, 2024
Transformer model for pdf invoice field extraction 🤗Transformers	0	801	January 15, 2024
Any Multi Modal LLMs that take direct pdf + text as input? 🤗Transformers	2	1846	October 10, 2024
Models for reading Schematic PDF's Models	2	85	January 28, 2025
Training a model for a PDF with OCR - where to begin? Beginners	4	10617	October 27, 2024

Any model that takes in a clean PDF and outputs a JSON of all the fillable fields that should be added to it + coordinates?

Related topics