Handwriting recognition. Can't recognize multiline words

I expect the model trocr-base-handwritten to extract all the text from the picture.

But the result is got from it is sentiment.

Full code:

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

p = 'picture.png'
processor = TrOCRProcessor.from_pretrained("trocr-base-handwritten/")
model = VisionEncoderDecoderModel.from_pretrained("trocr-base-handwritten/")
image = Image.open(p)
image_rgb = image.convert('RGB')
pixels = processor(image_rgb, return_tensors="pt").pixel_values
generated_ids = model.generate(pixels)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)

# generated_text output 
# ['sentiment']

How to configure, improve the model to get at least the similar phrase sense of humor?

Was thinking the exact same thing today, and posted a topic at: How to do full page analysis with TrOCR (integrating with text segmentation analysis)

I am a complete novice, but believe this is because the TrOCR model is based on single line input?

@nielsr mentioned we could use a separate text line segmentation program to then feed those line-by-line, or bounding box results into TrOCR. Don’t have a clue how to do that, and am hoping he or the other fine folks @huggingface forums might offer some helpful suggestions…

Hey, thx for sharing your concern.

i noticed that the oneline pictures are recognized quite good in comparison to the multiliners.
From the muultiliners i often get results like 1953 54 meaningless numbers.

Suggestion of a workaround would be great. Let’s wait for an answer.

@jhhf Have you considered other models, solutions for handwriting recognition by the way?

@kvdm-dev - yes, I’ve been looking at LayoutMLV2, LayoutML3 and Donut; but agree with you that the one-line transformers trained to a specific language seem to be quite good! @nielsr → any suggestions?

1 Like

And what is your experience with LayoutML3 and Donut? Is TrOCR better in your opinion?

Hoping to try LayoutML3 (N.C.) and Donut (MIT) this week. Came across these two yesterday, and think they may work for us:

Craft-Text-Detector (Updated 2022)

Caft-HW-OCR (Updated 2021) - Based on Craft-Text-Detector, and sends results to TrOCR!

If you test any of the above, let me know your results please!

Yeah, sure.

For the time being, i’m busy with other stuff, but later i’m gonna get back to improving recognition quality as my goal to understand the multiline handwritings.