Handwriting recognition. Can't recognize multiline words

kvdm-dev · May 10, 2023, 5:20pm

I expect the model trocr-base-handwritten to extract all the text from the picture.

But the result is got from it is sentiment.

Full code:

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

p = 'picture.png'
processor = TrOCRProcessor.from_pretrained("trocr-base-handwritten/")
model = VisionEncoderDecoderModel.from_pretrained("trocr-base-handwritten/")
image = Image.open(p)
image_rgb = image.convert('RGB')
pixels = processor(image_rgb, return_tensors="pt").pixel_values
generated_ids = model.generate(pixels)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)

# generated_text output 
# ['sentiment']

How to configure, improve the model to get at least the similar phrase sense of humor?

jhhf · May 10, 2023, 10:51pm

Was thinking the exact same thing today, and posted a topic at: How to do full page analysis with TrOCR (integrating with text segmentation analysis)

I am a complete novice, but believe this is because the TrOCR model is based on single line input?

@nielsr mentioned we could use a separate text line segmentation program to then feed those line-by-line, or bounding box results into TrOCR. Don’t have a clue how to do that, and am hoping he or the other fine folks @huggingface forums might offer some helpful suggestions…

kvdm-dev · May 11, 2023, 7:07am

Hey, thx for sharing your concern.

i noticed that the oneline pictures are recognized quite good in comparison to the multiliners.
From the muultiliners i often get results like 1953 54 meaningless numbers.

Suggestion of a workaround would be great. Let’s wait for an answer.

kvdm-dev · May 11, 2023, 7:08am

@jhhf Have you considered other models, solutions for handwriting recognition by the way?

jhhf · May 12, 2023, 6:29pm

@kvdm-dev - yes, I’ve been looking at LayoutMLV2, LayoutML3 and Donut; but agree with you that the one-line transformers trained to a specific language seem to be quite good! @nielsr → any suggestions?

kvdm-dev · May 13, 2023, 2:19pm

And what is your experience with LayoutML3 and Donut? Is TrOCR better in your opinion?

jhhf · May 13, 2023, 6:20pm

Hoping to try LayoutML3 (N.C.) and Donut (MIT) this week. Came across these two yesterday, and think they may work for us:

Craft-Text-Detector (Updated 2022)
craft-hw-ocr

Caft-HW-OCR (Updated 2021) - Based on Craft-Text-Detector, and sends results to TrOCR!
craft-text-detector

If you test any of the above, let me know your results please!

kvdm-dev · May 14, 2023, 1:01pm

Yeah, sure.

For the time being, i’m busy with other stuff, but later i’m gonna get back to improving recognition quality as my goal to understand the multiline handwritings.

Topic		Replies	Views
Best model to extract text from old Church records written in cursive? Models	2	42	May 18, 2025
TrOCR - inference on images in parallel Beginners	3	686	December 13, 2023
TrOCR large Printed outputs only in CAPITAL letters..why? Models	2	653	November 17, 2022
Fine-tuning TrOCR on new language 🤗Transformers	4	2354	April 10, 2025
Muti-Task Model - OCR + Object Detection Research	0	961	June 8, 2023

Handwriting recognition. Can't recognize multiline words

Related topics