Best model to read codes from small torn paper snippets (OCR)

Hi everyone,

I’m working on a task that involves reading 9-character alphanumeric codes from small paper snippets like the one in the image below. These are similar to voucher codes or printed serials. Here’s an example image:

I have about 300 such images that I can use for fine-tuning. The goal is to either:

  • Use a pre-trained model out-of-the-box, or
  • Fine-tune a suitable OCR model to extract the 9-character string accurately.

So far, I’ve tried the following:

  • TrOCR: Fine-tuned on my dataset but didn’t yield great results. Possibly due to suboptimal training settings.
  • SmolDocling: Lightweight but not very accurate on my dataset.
  • LLama3.2-vision: Works to some extent, but not reliable for precise character reading.
  • YOLO (custom-trained): Trained an object detection model to identify individual characters and then concatenate the detections into a string. This actually gave the best results so far, but there are edge cases (e.g. poor detection of “I”) where it fails.

I suspect that a model more specialized in OCR string detection, especially for short codes, would work better than object detection or large vision-language models.

Any suggestions for models or approaches that would suit this task well? Bonus points if the model is relatively lightweight and easy to deploy.

1 Like

I think PyTesseract is easy to use from Python. Since the layout is not complicated, there seem to be many models that can potentially handle this…

https://discuss.huggingface.co/search?q=ocr%20order%3Alatest

LLama3.2-vision: Works to some extent, but not reliable for precise character reading.

Llama 3.2 Vision isn’t bad, but Aya Vision and Qwen 2.5 VL are slightly better as trained models, so it might be worth trying them out.