I have a pgn image. In certain parts of the image, there are numbers. I would like to read these numbers programatically. Which model would you recommend?
The numbers are always going to be in exact locations, i.e. I would know the coordinates of the sub images containing the numbers.
There are quite a lot of options available under those conditions. Since you say that the coordinates are accurate, it should probably be possible to read them using just an OpenCV program without an AI model if you need something faster. If speed isn’t that critical, it would be easier to use an OCR model like PyTesseract.
I think it would also be possible to use a highly versatile VLM like Qwen 2.5 VL or Aya Vision, but that would be excessive.
If the conditions are that good, it might be possible with technology from 13 years ago. But if you leave the processing to a high-performance OCR model, you can save the trouble of writing code.
For edge devices, it might be better to consider using normal algorithms such as OpenCV, but if you have a GPU, I think it would be easier to use a generative AI.