Fine-tuning TrOCR on custom dataset

Hello, I am building an OCR system where I use the TrOCR for the text recognition as shown below:

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-printed")
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-printed")
optimizer = optim.AdamW(model.parameters(), lr=5e-5)

Since we are to use the OCR system for receipts we chose the “printed” pretrained model. We use a dataset of 5000 bounding boxes where each contain a word. However we experience that all metrics (cer, precision) and loss increase for each epoch we run. We cant figure out why the model performs more poorly for each epoch. This is how we train:

for epoch in range(self.epochs):
            train_loss = 0.0
            for batch in tqdm(self.train_dataloader):
                for k, v in batch.items():
                    batch[k] =
                outputs = self.model(**batch)
                loss = outputs.loss
                train_loss += loss.item()

Does anyone have an idea to what we might be doing wrong?