Muti-Task Model - OCR + Object Detection

HoussemBettaibi · June 8, 2023, 2:51pm

Hello Everyone,

I’m new to Transformers and HuggingFace ecosystem in general.

I need some guidance with a project as part of my studies consisting of creating a single model that can handle 2 tasks related to document processing. It takes as input an image containing handwritten text and signatures and stamps. the objective is to 1. detect the existance of a signature and a stamp in the image ( and then extract them by defining bounding boxes around them) and 2. extract the handwritten text.

I thought model architectures like TrOCR and LayoutLM might help.

Any suggestions on how to build such model , or any scientific papers/blogs that might orient me to the correct direction ?

Many Thanks,

Cheers !

Topic		Replies	Views
Can someone point me to docs for how to train my own a model? Models	2	621	January 3, 2023
Multi-input classification (images + Texts) Beginners	6	1140	February 18, 2024
How to extract tables from images using Hugging Face models? 🤗Transformers	1	358	September 17, 2024
Seeking advice on selecting the best OCR model for business card recognition Beginners	4	800	March 6, 2025
Resources for Sign Language Translation Beginners	0	1653	August 18, 2020

Muti-Task Model - OCR + Object Detection

Related topics