Get original image from trocr processor

Hi @nielsr, I am following your TrOCR finetuning with PyTorch but I have two questions: The processor resizes the image from 1700 x 134 to 384 x384, 1) is there a way to maintain the height of the original image or even use a custom dimension for training, eg. 512 x 134. and 2) is there a way to get the original image back for logging purposes as the processed image is unrecognizable after those basic augmentations. Thanks


thanks for your interest in TrOCR.

  1. what you can do is use torchvision.transforms.resize, which allows to resize an image but keeping the aspect ratio.

Technically you could use a different resolution during training (I guess you mean fine-tuning), by interpolating the position embeddings of the vision encoder to the custom resolution. This can be achieved by passing interpolate_pos_encoding=True to the forward of VisionEncoderDecoderModel (which makes sure the vision encoder, ViT, will interpolate the position embeddings to the custom resolution).

  1. sure that’s possible, you just need to unnormalize the image in order to visualize it. See here for a function that you can use to visualize transformations.