Basically, this task can be performed using VLM, but recognizing actual handwritten characters and text is quite difficult. I recommend trying out various models online and using the ones that work well locally. With VRAM savings through quantization, there are models that can run with 6GB.
I expect the model trocr-base-handwritten to extract all the text from the picture.
[16e9e061da2.9e37232443debf53]
But the result is got from it is sentiment.
Full code:
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
p = 'picture.png'
processor = TrOCRProcessor.from_pretrained("trocr-base-handwritten/")
model = VisionEncoderDecoderModel.from_pretrained("trocr-base-handwritten/")
image = Image.open(p)
image_rgb = image.convert('RGB')
pixels = proces…
Hello everyone,
I am currently looking for suggestions to implement a handwritten unstructured invoice parsing pipeline.
What open-source models do you recommend for handwritten ocr/parsing?
I have tried EaysOCR, Qwen, Intern-MPO, LayoutLM but they all seem to achieve poor results with handwritten invoices.
The idea is to find an open-source alternative to Textract OCR, so that I can fine-tune it when Textract performs poorly.
Thank you!
1 Like