TrOCR fine-tuning

Anonumous · January 2, 2022, 9:22am

Hello, I fine tune microsoft/trocr-small-stage1 for my dataset. When I want to test my model on a validation sample, the inference time is several hours, while microsoft/trocr-small-handwritten сopes with it in 10 minutes. What could be the problem, is it possible to speed up the inference somehow?

StephennFernandes · February 1, 2022, 8:32am

i am too thinking of training TrOCR for Kannada, i was able to find a BERT model for kannada at huggingface. how do i generate a dataset that can be used for training TrOCR ???

nielsr · February 1, 2022, 9:07am

For training TrOCR, you just need a dataset of (image, text) pairs.

In order to speed up inference time, you can either (1) run on GPU (2) look into optimizations such as ONNX, quantization, etc.

StephennFernandes · February 6, 2022, 10:40am

when you say image text pairs, can i provide an entire image of a page which have multiple lines and a corresponding .txt file with the exact same list in the same position. does TrOCR suport multilines page inputs ?

if not how do i produce a line by line dataset ?

nielsr · February 7, 2022, 8:25pm

TrOCR itself was trained on single line-text images. This was a choice made by Microsoft. They used a text detector to get individual single line-text images from documents. Examples which you can use are CRAFT or text detectors available in DocTR.

Of course, nothing stops you from training a VisionEncoderDecoderModel that takes in an entire PDF document and returns the entire text appearing into it.

Topic		Replies	Views
Fine-tuning TrOCR to do digit recognition in another language Models	0	286	May 21, 2024
Fine-tuning TrOCR on custom dataset 🤗Transformers	1	2545	October 18, 2023
TrOCR - inference on images in parallel Beginners	3	686	December 13, 2023
Fine-tuning TrOCR on new language 🤗Transformers	4	2354	April 10, 2025
Fine tune trocr model Models	1	179	April 18, 2025

TrOCR fine-tuning

Related topics