Image to text help for personal project

anninasim · March 28, 2025, 3:53pm

Request for assistance with industrial extruder panel analysis project

Problem to solve

I’m working on a project where I need to analyze images of control panels from industrial machines, specifically extruders. The images are PNGs showing various parameters such as flow rate, film properties, thickness, etc.

My goal is to extract all numerical data and information displayed on the panel and then save it in a structured format (table/SQL database). I’ve already attempted the traditional OCR approach, but it doesn’t work well due to noise and the specific formats of the panels.

Example image

[Attach your example image here]

As you can see, the panel contains various numerical values organized in a tabular structure with sections (A, B, C), values for flow rate, film properties, etc.

Approaches tried

I’ve tried Tesseract OCR, but it’s not accurate enough with this type of industrial panel. I’m looking to use more advanced image-to-text models that can run locally.

Specific questions

Which model on Hugging Face would you recommend for extracting structured information from this type of industrial panel?
What would be the best way to fine-tune a model on my specific extruder panels?
Is there a ready-made pipeline for similar use cases?

Technical requirements

The model must run locally
Preferably implementable in Python
Able to handle numerical values with precision (errors could have significant consequences)
Capable of understanding the tabular structure of the panel

Thanks in advance for your help! This project is very important, and any suggestions will be appreciated.

John6666 · March 28, 2025, 4:40pm

There are many document (or document image) analysis models, but as a characteristic of LLM and VLM, they are quite difficult to use when accuracy is a priority. I think it will be necessary to perform some degree of numerical validation in the post-processing stage.

The following are currently applicable and easy to find, but it is probably impossible to achieve them as standalone tools…
LayoutLM, etc. have a pipeline in Hugging Face’s Transformers.

In fields where accuracy and other expertise are required, it may be better to consider a method that combines multiple generative AIs such as RAG rather than trying to do it with a single model. With RAG and agents, for example, it is possible to combine expensive ChatGPT with inexpensive local models, so I think the options will expand.

Similar topics related to PDF also come up in this forum, so please check out the recommended models.
https://discuss.huggingface.co/search?q=pdf%20order%3Alatest

Topic		Replies	Views
LayoutLM for table detection and extraction Beginners	3	8405	July 11, 2023
LayoutLM for extraction of information from tables Research	1	1544	September 29, 2022
How to extract text using LayoutLM2 Beginners	0	1213	June 7, 2022
Can LayoutLM be used for images? Beginners	2	853	January 11, 2021
FineTuning - Possible to extract captions embedded in png files? Beginners	1	266	December 16, 2022