Fine-tuning TrOCR on custom dataset

Hello, I am building an OCR system where I use the TrOCR for the text recognition as shown below:

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-printed")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-printed")
optimizer = optim.AdamW(model.parameters(), lr=5e-5)

Since we are to use the OCR system for receipts we chose the “printed” pretrained model. We use a dataset of 5000 bounding boxes where each contain a word. However we experience that all metrics (cer, precision) and loss increase for each epoch we run. We cant figure out why the model performs more poorly for each epoch. This is how we train:

for epoch in range(self.epochs):
            self.model.train()
            train_loss = 0.0
            for batch in tqdm(self.train_dataloader):
                for k, v in batch.items():
                    batch[k] = v.to(self.device)
                outputs = self.model(**batch)
                loss = outputs.loss
                loss.backward()
                self.optimizer.step()
                self.optimizer.zero_grad()
                train_loss += loss.item()

Does anyone have an idea to what we might be doing wrong?

This is performing poor because you are using already fined-tuned model for training. this microsoft/trocr-base-printed is fine-tuned on SROIE dataset. You should choose some pre-trained model of trocr like microsoft/trocr-base-stage1. select from here Models - Hugging Face.