Image to text help for personal project

Request for assistance with industrial extruder panel analysis project

Problem to solve

I’m working on a project where I need to analyze images of control panels from industrial machines, specifically extruders. The images are PNGs showing various parameters such as flow rate, film properties, thickness, etc.

My goal is to extract all numerical data and information displayed on the panel and then save it in a structured format (table/SQL database). I’ve already attempted the traditional OCR approach, but it doesn’t work well due to noise and the specific formats of the panels.

Example image

[Attach your example image here]

As you can see, the panel contains various numerical values organized in a tabular structure with sections (A, B, C), values for flow rate, film properties, etc.

Approaches tried

I’ve tried Tesseract OCR, but it’s not accurate enough with this type of industrial panel. I’m looking to use more advanced image-to-text models that can run locally.

Specific questions

  1. Which model on Hugging Face would you recommend for extracting structured information from this type of industrial panel?
  2. What would be the best way to fine-tune a model on my specific extruder panels?
  3. Is there a ready-made pipeline for similar use cases?

Technical requirements

  • The model must run locally
  • Preferably implementable in Python
  • Able to handle numerical values with precision (errors could have significant consequences)
  • Capable of understanding the tabular structure of the panel

Thanks in advance for your help! This project is very important, and any suggestions will be appreciated.

1 Like

There are many document (or document image) analysis models, but as a characteristic of LLM and VLM, they are quite difficult to use when accuracy is a priority. I think it will be necessary to perform some degree of numerical validation in the post-processing stage.

The following are currently applicable and easy to find, but it is probably impossible to achieve them as standalone tools…
LayoutLM, etc. have a pipeline in Hugging Face’s Transformers.

In fields where accuracy and other expertise are required, it may be better to consider a method that combines multiple generative AIs such as RAG rather than trying to do it with a single model. With RAG and agents, for example, it is possible to combine expensive ChatGPT with inexpensive local models, so I think the options will expand.

Similar topics related to PDF also come up in this forum, so please check out the recommended models.
https://discuss.huggingface.co/search?q=pdf%20order%3Alatest