Best model to read codes from small torn paper snippets (OCR)

nikogamulin · April 17, 2025, 5:16am

Hi everyone,

I’m working on a task that involves reading 9-character alphanumeric codes from small paper snippets like the one in the image below. These are similar to voucher codes or printed serials. Here’s an example image:

I have about 300 such images that I can use for fine-tuning. The goal is to either:

Use a pre-trained model out-of-the-box, or
Fine-tune a suitable OCR model to extract the 9-character string accurately.

So far, I’ve tried the following:

TrOCR: Fine-tuned on my dataset but didn’t yield great results. Possibly due to suboptimal training settings.
SmolDocling: Lightweight but not very accurate on my dataset.
LLama3.2-vision: Works to some extent, but not reliable for precise character reading.
YOLO (custom-trained): Trained an object detection model to identify individual characters and then concatenate the detections into a string. This actually gave the best results so far, but there are edge cases (e.g. poor detection of “I”) where it fails.

I suspect that a model more specialized in OCR string detection, especially for short codes, would work better than object detection or large vision-language models.

Any suggestions for models or approaches that would suit this task well? Bonus points if the model is relatively lightweight and easy to deploy.

John6666 · April 17, 2025, 6:43am

I think PyTesseract is easy to use from Python. Since the layout is not complicated, there seem to be many models that can potentially handle this…

https://discuss.huggingface.co/search?q=ocr%20order%3Alatest

LLama3.2-vision: Works to some extent, but not reliable for precise character reading.

Llama 3.2 Vision isn’t bad, but Aya Vision and Qwen 2.5 VL are slightly better as trained models, so it might be worth trying them out.

Topic		Replies	Views
Recognising numbers in an image Models	3	42	April 9, 2025
Seeking advice on selecting the best OCR model for business card recognition Beginners	4	794	March 6, 2025
Training a model for a PDF with OCR - where to begin? Beginners	4	10564	October 27, 2024
Image to text models tailored for web scraping? Models	1	809	June 9, 2024
OCR model suggestion 🤗Transformers	0	854	March 21, 2024

Best model to read codes from small torn paper snippets (OCR)

Related topics