Hi @nielsr, I am following your TrOCR finetuning with PyTorch but I have two questions: The processor resizes the image from 1700 x 134 to 384 x384, 1) is there a way to maintain the height of the original image or even use a custom dimension for training, eg. 512 x 134. and 2) is there a way to get the original image back for logging purposes as the processed image is unrecognizable after those basic augmentations. Thanks
Hi,
thanks for your interest in TrOCR.
- what you can do is use torchvision.transforms.resize, which allows to resize an image but keeping the aspect ratio.
Technically you could use a different resolution during training (I guess you mean fine-tuning), by interpolating the position embeddings of the vision encoder to the custom resolution. This can be achieved by passing interpolate_pos_encoding=True
to the forward of VisionEncoderDecoderModel
(which makes sure the vision encoder, ViT, will interpolate the position embeddings to the custom resolution).
- sure that’s possible, you just need to unnormalize the image in order to visualize it. See here for a function that you can use to visualize transformations.