Getting poor word acuracy after fine-tuning trocr on Bangla language

rajsabi · October 18, 2023, 9:58am

I have fine tuned trocr for bangla language, used 1.8M word images (training images) and 0.4M word images for validation. Encoder used microsoft/beit-base-patch16-384 and decoder used is xlm-roberta-base. epoch = 11. parameters obtained for best saved checkpoint is " loss: 0.8127, learning_rate: 4.6280169780882425e-05, epoch: 0.86, step: 20000 eval_loss: 7.68533182144165, eval_cer: 2.3218661516832686, eval_runtime: 14836.3588, eval_samples_per_second: 31.419, eval_steps_per_second: 0.393, epoch: 1.0, step: 23308 ". But when on testing the saved model on 20k seen word images, gives word accuracy of 40.6%. I have followed [https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TrOCR/Fine_tune_TrOCR_on_IAM_Handwriting_Database_using_Seq2SeqTrainer.ipynb]. Can you please tell me what may be the possible reasons for this much bad accuracy on seen data when eval_cer is good? And how can I improve the accuracy of this fine-tuned model? Please help!!!
@sgugger @nielsr @ydshieh @pierreguillou @IdoAmit198
Thanks in advance!

AnustupOCR · October 18, 2023, 10:40am

Actually what i found during experimenting with TrOCR is that the accuracy largely depends on which encoder and decoder you’re using. My aim was to go for multilingual ocr which included bengali as well and used 4M synthetically generated sentence dataset. Can you please mention which models you are using as encoder and decoder?

AnustupOCR · October 18, 2023, 10:42am

Sorry i skipped the part where you mentioned the models. Beit should work fine as the encoder but try switching out xlm roberta as the decoder with something else. Actually if you go through the architecture of the model in TrOCR base checkpoint you’d find that it’s quite different from Roberta instead quite similar to BART so I would suggest go with some bengali version of Bart.

rajsabi · October 18, 2023, 11:24am

Thanks for your reply @AnustupOCR …But there is no pre trained model of Bart for Bangla language!

rajsabi · October 18, 2023, 11:39am

I went through paper and found out that microsoft/trocr-base-stage1 has been trained on BeiT and RoBERTa model. So I used xlm-roberta-base (pre-trained on filtered CommonCrawled data containing 100 languages) in decoder.

Topic		Replies	Views
Fine tune trocr model Models	1	179	April 18, 2025
Fine-Tune TrOCR on Arabic Beginners	3	1487	August 24, 2024
Fine-tuning TrOCR on new language 🤗Transformers	4	2354	April 10, 2025
Fine-tuning TrOCR on custom dataset 🤗Transformers	1	2544	October 18, 2023
How to fine tune TrOCR model properly? Beginners	2	8435	November 15, 2021

Getting poor word acuracy after fine-tuning trocr on Bangla language

Related topics